library(knitr)
source("../R/SFA.ExtractTopFeatures.R")

We perform gene annotations from the GTEx SFA analysis.

GTEx 2013 Factor analysis (sparse loadings: sqrt counts)

lambda_out <- read.table("../sfa_outputs/GTEX2013/counts_sqrt_gtex/counts_sqrt_gtex_lambda.out");
f_out <- t(read.table("../sfa_outputs/GTEX2013/counts_sqrt_gtex/counts_sqrt_gtex_F.out"));

gene_names <- as.vector(as.matrix(read.table("../sfa_inputs/gene_names_GTEX_V6.txt")));
gene_names <- substring(gene_names,1,15);
xli  <-  gene_names;

indices_mat <- SFA.ExtractTopFeatures(f_out, top_features = 100, options="min", mult.annotate = TRUE)

gene_list <- do.call(rbind, lapply(1:dim(indices_mat)[1], function(x) gene_names[indices_mat[x,]]))

Factor 1 Annotations

out <- mygene::queryMany(gene_list[1,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query symbol summary X_id name
ENSG00000171401 KRT13 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. 3860 keratin 13
ENSG00000170477 KRT4 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3851 keratin 4
ENSG00000163209 SPRR3 NA 6707 small proline rich protein 3
ENSG00000163220 S100A9 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. 6280 S100 calcium binding protein A9
ENSG00000229732 AC019349.5 NA ENSG00000229732 NA
ENSG00000205420 KRT6A The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. As many as six of this type II cytokeratin (KRT6) have been identified; the multiplicity of the genes is attributed to successive gene duplication events. The genes are expressed with family members KRT16 and/or KRT17 in the filiform papillae of the tongue, the stratified epithelial lining of oral mucosa and esophagus, the outer root sheath of hair follicles, and the glandular epithelia. This KRT6 gene in particular encodes the most abundant isoform. Mutations in these genes have been associated with pachyonychia congenita. In addition, peptides from the C-terminal region of the protein have antimicrobial activity against bacterial pathogens. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3853 keratin 6A
ENSG00000135046 ANXA1 This gene encodes a membrane-localized protein that binds phospholipids. This protein inhibits phospholipase A2 and has anti-inflammatory activity. Loss of function or expression of this gene has been detected in multiple tumors. 301 annexin A1
ENSG00000143546 S100A8 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and as a cytokine. Altered expression of this protein is associated with the disease cystic fibrosis. Multiple transcript variants encoding different isoforms have been found for this gene. 6279 S100 calcium binding protein A8
ENSG00000143536 CRNN This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation. 49860 cornulin
ENSG00000140519 RHCG NA 51458 Rh family C glycoprotein
ENSG00000186081 KRT5 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the basal layer of the epidermis with family member KRT14. Mutations in these genes have been associated with a complex of diseases termed epidermolysis bullosa simplex. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3852 keratin 5
ENSG00000160213 CSTB The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and kininogens. This gene encodes a stefin that functions as an intracellular thiol protease inhibitor. The protein is able to form a dimer stabilized by noncovalent forces, inhibiting papain and cathepsins l, h and b. The protein is thought to play a role in protecting against the proteases leaking from lysosomes. Evidence indicates that mutations in this gene are responsible for the primary defects in patients with progressive myoclonic epilepsy (EPM1). 1476 cystatin B
ENSG00000118898 PPL The protein encoded by this gene is a component of desmosomes and of the epidermal cornified envelope in keratinocytes. The N-terminal domain of this protein interacts with the plasma membrane and its C-terminus interacts with intermediate filaments. Through its rod domain, this protein forms complexes with envoplakin. This protein may serve as a link between the cornified envelope and desmosomes as well as intermediate filaments. AKT1/PKB, a protein kinase mediating a variety of cell growth and survival signaling processes, is reported to interact with this protein, suggesting a possible role for this protein as a localization signal in AKT1-mediated signaling. 5493 periplakin
ENSG00000134531 EMP1 NA 2012 epithelial membrane protein 1
ENSG00000107317 PTGDS The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. 5730 prostaglandin D2 synthase
ENSG00000197971 MBP The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. 4155 myelin basic protein
ENSG00000121552 CSTA The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins, and kininogens. This gene encodes a stefin that functions as a cysteine protease inhibitor, forming tight complexes with papain and the cathepsins B, H, and L. The protein is one of the precursor proteins of cornified cell envelope in keratinocytes and plays a role in epidermal development and maintenance. Stefins have been proposed as prognostic and diagnostic tools for cancer. 1475 cystatin A
ENSG00000111640 GAPDH This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. 2597 glyceraldehyde-3-phosphate dehydrogenase
ENSG00000143369 ECM1 This gene encodes a soluble protein that is involved in endochondral bone formation, angiogenesis, and tumor biology. It also interacts with a variety of extracellular and structural proteins, contributing to the maintenance of skin integrity and homeostasis. Mutations in this gene are associated with lipoid proteinosis disorder (also known as hyalinosis cutis et mucosae or Urbach-Wiethe disease) that is characterized by generalized thickening of skin, mucosae and certain viscera. Alternatively spliced transcript variants encoding distinct isoforms have been described for this gene. 1893 extracellular matrix protein 1
ENSG00000065978 YBX1 This gene encodes a highly conserved cold shock domain protein that has broad nucleic acid binding properties. The encoded protein functions as both a DNA and RNA binding protein and has been implicated in numerous cellular processes including regulation of transcription and translation, pre-mRNA splicing, DNA reparation and mRNA packaging. This protein is also a component of messenger ribonucleoprotein (mRNP) complexes and may have a role in microRNA processing. This protein can be secreted through non-classical pathways and functions as an extracellular mitogen. Aberrant expression of the gene is associated with cancer proliferation in numerous tissues. This gene may be a prognostic marker for poor outcome and drug resistance in certain cancers. Alternate splicing results in multiple transcript variants. Pseudogenes of this gene are found on multiple chromosomes. 4904 Y-box binding protein 1
ENSG00000133710 SPINK5 This gene encodes a multidomain serine protease inhibitor that contains 15 potential inhibitory domains. The encoded preproprotein is proteolytically processed to generate multiple protein products, which may exhibit unique activities and specificities. These proteins may play a role in skin and hair morphogenesis, as well as anti-inflammatory and antimicrobial protection of mucous epithelia. Mutations in this gene may result in Netherton syndrome, a disorder characterized by ichthyosis, defective cornification, and atopy. This gene is present in a gene cluster on chromosome 5. Alternative splicing results in multiple transcript variants. 11005 serine peptidase inhibitor, Kazal type 5
ENSG00000163017 ACTG2 Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. 72 actin, gamma 2, smooth muscle, enteric
ENSG00000125780 TGM3 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene consists of two polypeptide chains activated from a single precursor protein by proteolysis. The encoded protein is involved the later stages of cell envelope formation in the epidermis and hair follicle. 7053 transglutaminase 3
ENSG00000124942 AHNAK NA 79026 AHNAK nucleoprotein
ENSG00000060138 YBX3 NA 8531 Y-box binding protein 3
ENSG00000047849 MAP4 The protein encoded by this gene is a major non-neuronal microtubule-associated protein. This protein contains a domain similar to the microtubule-binding domains of neuronal microtubule-associated protein (MAP2) and microtubule-associated protein tau (MAPT/TAU). This protein promotes microtubule assembly, and has been shown to counteract destabilization of interphase microtubule catastrophe promotion. Cyclin B was found to interact with this protein, which targets cell division cycle 2 (CDC2) kinase to microtubules. The phosphorylation of this protein affects microtubule properties and cell cycle progression. Multiple transcript variants encoding different isoforms have been found for this gene. 4134 microtubule associated protein 4
ENSG00000189334 S100A14 This gene encodes a member of the S100 protein family which contains an EF-hand motif and binds calcium. The gene is located in a cluster of S100 genes on chromosome 1. Levels of the encoded protein have been found to be lower in cancerous tissue and associated with metastasis suggesting a tumor suppressor function (PMID: 19956863, 19351828). 57402 S100 calcium binding protein A14
ENSG00000009307 CSDE1 NA 7812 cold shock domain containing E1
ENSG00000080824 HSP90AA1 The protein encoded by this gene is an inducible molecular chaperone that functions as a homodimer. The encoded protein aids in the proper folding of specific target proteins by use of an ATPase activity that is modulated by co-chaperones. Two transcript variants encoding different isoforms have been found for this gene. 3320 heat shock protein 90kDa alpha family class A member 1
ENSG00000174437 ATP2A2 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol into the sarcoplasmic reticulum lumen, and is involved in regulation of the contraction/relaxation cycle. Mutations in this gene cause Darier-White disease, also known as keratosis follicularis, an autosomal dominant skin disorder characterized by loss of adhesion between epidermal cells and abnormal keratinization. Alternative splicing results in multiple transcript variants encoding different isoforms. 488 ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 2
ENSG00000171345 KRT19 The protein encoded by this gene is a member of the keratin family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. The type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. Unlike its related family members, this smallest known acidic cytokeratin is not paired with a basic cytokeratin in epithelial cells. It is specifically expressed in the periderm, the transiently superficial layer that envelopes the developing epidermis. The type I cytokeratins are clustered in a region of chromosome 17q12-q21. 3880 keratin 19
ENSG00000169474 SPRR1A NA 6698 small proline rich protein 1A
ENSG00000152556 PFKM Three phosphofructokinase isozymes exist in humans: muscle, liver and platelet. These isozymes function as subunits of the mammalian tetramer phosphofructokinase, which catalyzes the phosphorylation of fructose-6-phosphate to fructose-1,6-bisphosphate. Tetramer composition varies depending on tissue type. This gene encodes the muscle-type isozyme. Mutations in this gene have been associated with glycogen storage disease type VII, also known as Tarui disease. Alternatively spliced transcript variants have been described. 5213 phosphofructokinase, muscle
ENSG00000165272 AQP3 This gene encodes the water channel protein aquaporin 3. Aquaporins are a family of small integral membrane proteins related to the major intrinsic protein, also known as aquaporin 0. Aquaporin 3 is localized at the basal lateral membranes of collecting duct cells in the kidney. In addition to its water channel function, aquaporin 3 has been found to facilitate the transport of nonionic small solutes such as urea and glycerol, but to a smaller degree. It has been suggested that water channels can be functionally heterogeneous and possess water and solute permeation mechanisms. Alternative splicing of this gene results in multiple transcript variants encoding different isoforms. 360 aquaporin 3 (Gill blood group)
ENSG00000184292 TACSTD2 This intronless gene encodes a carcinoma-associated antigen. This antigen is a cell surface receptor that transduces calcium signals. Mutations of this gene have been associated with gelatinous drop-like corneal dystrophy. 4070 tumor-associated calcium signal transducer 2
ENSG00000241794 SPRR2A NA 6700 small proline rich protein 2A
ENSG00000170315 UBB This gene encodes ubiquitin, one of the most conserved proteins known. Ubiquitin has a major role in targeting cellular proteins for degradation by the 26S proteosome. It is also involved in the maintenance of chromatin structure, the regulation of gene expression, and the stress response. Ubiquitin is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin moiety fused to an unrelated protein. This gene consists of three direct repeats of the ubiquitin coding sequence with no spacer sequence. Consequently, the protein is expressed as a polyubiquitin precursor with a final amino acid after the last repeat. An aberrant form of this protein has been detected in patients with Alzheimer’s disease and Down syndrome. Pseudogenes of this gene are located on chromosomes 1, 2, 13, and 17. Alternative splicing results in multiple transcript variants. 7314 ubiquitin B
ENSG00000136689 IL1RN The protein encoded by this gene is a member of the interleukin 1 cytokine family. This protein inhibits the activities of interleukin 1, alpha (IL1A) and interleukin 1, beta (IL1B), and modulates a variety of interleukin 1 related immune and inflammatory responses. This gene and five other closely related cytokine genes form a gene cluster spanning approximately 400 kb on chromosome 2. A polymorphism of this gene is reported to be associated with increased risk of osteoporotic fractures and gastric cancer. Several alternatively spliced transcript variants encoding distinct isoforms have been reported. 3557 interleukin 1 receptor antagonist
ENSG00000114416 FXR1 The protein encoded by this gene is an RNA binding protein that interacts with the functionally-similar proteins FMR1 and FXR2. These proteins shuttle between the nucleus and cytoplasm and associate with polyribosomes, predominantly with the 60S ribosomal subunit. Three transcript variants encoding different isoforms have been found for this gene. 8087 FMR1 autosomal homolog 1
ENSG00000171346 KRT15 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains and are clustered in a region on chromosome 17q21.2. 3866 keratin 15
ENSG00000136811 ODF2 The outer dense fibers are cytoskeletal structures that surround the axoneme in the middle piece and principal piece of the sperm tail. The fibers function in maintaining the elastic structure and recoil of the sperm tail as well as in protecting the tail from shear forces during epididymal transport and ejaculation. Defects in the outer dense fibers lead to abnormal sperm morphology and infertility. This gene encodes one of the major outer dense fiber proteins. Alternative splicing results in multiple transcript variants. The longer transcripts, also known as ‘Cenexins’, encode proteins with a C-terminal extension that are differentially targeted to somatic centrioles and thought to be crucial for the formation of microtubule organizing centers. 4957 outer dense fiber of sperm tails 2
ENSG00000126777 KTN1 This gene encodes an integral membrane protein that is a member of the kinectin protein family. The encoded protein is primarily localized to the endoplasmic reticulum membrane. This protein binds kinesin and may be involved in intracellular organelle motility. This protein also binds translation elongation factor-delta and may be involved in the assembly of the elongation factor-1 complex. Alternate splicing results in multiple transcript variants of this gene. 3895 kinectin 1
ENSG00000120885 CLU The protein encoded by this gene is a secreted chaperone that can under some stress conditions also be found in the cell cytosol. It has been suggested to be involved in several basic biological events such as cell death, tumor progression, and neurodegenerative disorders. Alternate splicing results in both coding and non-coding variants. 1191 clusterin
ENSG00000175793 SFN NA 2810 stratifin
ENSG00000198467 TPM2 This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 7169 tropomyosin 2 (beta)
ENSG00000163191 S100A11 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in motility, invasion, and tubulin polymerization. Chromosomal rearrangements and altered expression of this gene have been implicated in tumor metastasis. 6282 S100 calcium binding protein A11
ENSG00000160014 CALM3 NA 808 calmodulin 3 (phosphorylase kinase, delta)
ENSG00000160014 CALM2 This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. 805 calmodulin 2 (phosphorylase kinase, delta)
ENSG00000178104 PDE4DIP The protein encoded by this gene serves to anchor phosphodiesterase 4D to the Golgi/centrosome region of the cell. Defects in this gene may be a cause of myeloproliferative disorder (MBD) associated with eosinophilia. Several transcript variants encoding different isoforms have been found for this gene. 9659 phosphodiesterase 4D interacting protein
ENSG00000154358 OBSCN The obscurin gene spans more than 150 kb, contains over 80 exons and encodes a protein of approximately 720 kDa. The encoded protein contains 68 Ig domains, 2 fibronectin domains, 1 calcium/calmodulin-binding domain, 1 RhoGEF domain with an associated PH domain, and 2 serine-threonine kinase domains. This protein belongs to the family of giant sacromeric signaling proteins that includes titin and nebulin, and may have a role in the organization of myofibrils during assembly and may mediate interactions between the sarcoplasmic reticulum and myofibrils. Alternatively spliced transcript variants encoding different isoforms have been identified. 84033 obscurin, cytoskeletal calmodulin and titin-interacting RhoGEF
ENSG00000065150 IPO5 Nucleocytoplasmic transport, a signal- and energy-dependent process, takes place through nuclear pore complexes embedded in the nuclear envelope. The import of proteins containing a nuclear localization signal (NLS) requires the NLS import receptor, a heterodimer of importin alpha and beta subunits also known as karyopherins. Importin alpha binds the NLS-containing cargo in the cytoplasm and importin beta docks the complex at the cytoplasmic side of the nuclear pore complex. In the presence of nucleoside triphosphates and the small GTP binding protein Ran, the complex moves into the nuclear pore complex and the importin subunits dissociate. Importin alpha enters the nucleoplasm with its passenger protein and importin beta remains at the pore. Interactions between importin beta and the FG repeats of nucleoporins are essential in translocation through the pore complex. The protein encoded by this gene is a member of the importin beta family. 3843 importin 5
ENSG00000172005 MAL The protein encoded by this gene is a highly hydrophobic integral membrane protein belonging to the MAL family of proteolipids. The protein has been localized to the endoplasmic reticulum of T-cells and is a candidate linker protein in T-cell signal transduction. In addition, this proteolipid is localized in compact myelin of cells in the nervous system and has been implicated in myelin biogenesis and/or function. The protein plays a role in the formation, stabilization and maintenance of glycosphingolipid-enriched membrane microdomains. Down-regulation of this gene has been associated with a variety of human epithelial malignancies. Alternative splicing produces four transcript variants which vary from each other by the presence or absence of alternatively spliced exons 2 and 3. 4118 mal, T-cell differentiation protein
ENSG00000078674 PCM1 The protein encoded by this gene is a component of centriolar satellites, which are electron dense granules scattered around centrosomes. Inhibition studies show that this protein is essential for the correct localization of several centrosomal proteins, and for anchoring microtubules to the centrosome. Chromosomal aberrations involving this gene are associated with papillary thyroid carcinomas and a variety of hematological malignancies, including atypical chronic myeloid leukemia and T-cell lymphoma. Multiple transcript variants encoding different isoforms have been found for this gene. 5108 pericentriolar material 1
ENSG00000149925 ALDOA The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10. 226 aldolase, fructose-bisphosphate A
ENSG00000092295 TGM1 The protein encoded by this gene is a membrane protein that catalyzes the addition of an alkyl group from an akylamine to a glutamine residue of a protein, forming an alkylglutamine in the protein. This protein alkylation leads to crosslinking of proteins and catenation of polyamines to proteins. This gene contains either one or two copies of a 22 nt repeat unit in its 3’ UTR. Mutations in this gene have been associated with autosomal recessive lamellar ichthyosis (LI) and nonbullous congenital ichthyosiform erythroderma (NCIE). 7051 transglutaminase 1
ENSG00000143549 TPM3 This gene encodes a member of the tropomyosin family of actin-binding proteins. Tropomyosins are dimers of coiled-coil proteins that provide stability to actin filaments and regulate access of other actin-binding proteins. Mutations in this gene result in autosomal dominant nemaline myopathy and other muscle disorders. This locus is involved in translocations with other loci, including anaplastic lymphoma receptor tyrosine kinase (ALK) and neurotrophic tyrosine kinase receptor type 1 (NTRK1), which result in the formation of fusion proteins that act as oncogenes. There are numerous pseudogenes for this gene on different chromosomes. Alternative splicing results in multiple transcript variants. 7170 tropomyosin 3
ENSG00000172270 BSG The protein encoded by this gene is a plasma membrane protein that is important in spermatogenesis, embryo implantation, neural network formation, and tumor progression. The encoded protein is also a member of the immunoglobulin superfamily. Multiple transcript variants encoding different isoforms have been found for this gene. 682 basigin (Ok blood group)
ENSG00000186395 KRT10 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. 3858 keratin 10
ENSG00000185787 MORF4L1 NA 10933 mortality factor 4 like 1
ENSG00000167468 GPX4 This gene encodes a member of the glutathione peroxidase protein family. Glutathione peroxidase catalyzes the reduction of hydrogen peroxide, organic hydroperoxide, and lipid peroxides by reduced glutathione and functions in the protection of cells against oxidative damage. Human plasma glutathione peroxidase has been shown to be a selenium-containing enzyme and the UGA codon is translated into a selenocysteine. The encoded protein has been identified as a moonlighting protein based on its ability to serve dual functions as a peroxidase as well as a structural protein in mature spermatozoa. Through alternative splicing and transcription initiation, rat produces proteins that localize to the nucleus, mitochondrion, and cytoplasm. In humans, alternative transcription initiation and the cleavage sites of the mitochondrial and nuclear transit peptides need to be experimentally verified. Alternative splicing results in multiple transcript variants. 2879 glutathione peroxidase 4
ENSG00000129250 KIF1C The protein encoded by this gene is a member of the kinesin-like protein family. The family members are microtubule-dependent molecular motors that transport organelles within cells and move chromosomes during cell division. Mutations in this gene are a cause of spastic ataxia 2, autosomal recessive. 10749 kinesin family member 1C
ENSG00000134202 GSTM3 Cytosolic and membrane-bound forms of glutathione S-transferase are encoded by two distinct supergene families. At present, eight distinct classes of the soluble cytoplasmic mammalian glutathione S-transferases have been identified: alpha, kappa, mu, omega, pi, sigma, theta and zeta. This gene encodes a glutathione S-transferase that belongs to the mu class. The mu class of enzymes functions in the detoxification of electrophilic compounds, including carcinogens, therapeutic drugs, environmental toxins and products of oxidative stress, by conjugation with glutathione. The genes encoding the mu class of enzymes are organized in a gene cluster on chromosome 1p13.3 and are known to be highly polymorphic. These genetic variations can change an individual’s susceptibility to carcinogens and toxins as well as affect the toxicity and efficacy of certain drugs. Mutations of this class mu gene have been linked with a slight increase in a number of cancers, likely due to exposure with environmental toxins. Alternative splicing results in multiple transcript variants. 2947 glutathione S-transferase mu 3 (brain)
ENSG00000196465 MYL6B Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain expressed in both slow-twitch skeletal muscle and in nonmuscle tissue. Alternative splicing results in multiple transcript variants. 140465 myosin light chain 6B
ENSG00000153827 TRIP12 NA 9320 thyroid hormone receptor interactor 12
ENSG00000109846 CRYAB Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. 1410 crystallin alpha B
ENSG00000165474 GJB2 This gene encodes a member of the gap junction protein family. The gap junctions were first characterized by electron microscopy as regionally specialized structures on plasma membranes of contacting adherent cells. These structures were shown to consist of cell-to-cell channels that facilitate the transfer of ions and small molecules between cells. The gap junction proteins, also known as connexins, purified from fractions of enriched gap junctions from different tissues differ. According to sequence similarities at the nucleotide and amino acid levels, the gap junction proteins are divided into two categories, alpha and beta. Mutations in this gene are responsible for as much as 50% of pre-lingual, recessive deafness. 2706 gap junction protein beta 2
ENSG00000128591 FLNC This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. 2318 filamin C
ENSG00000204469 PRRC2A A cluster of genes, BAT1-BAT5, has been localized in the vicinity of the genes for TNF alpha and TNF beta. These genes are all within the human major histocompatibility complex class III region. This gene has microsatellite repeats which are associated with the age-at-onset of insulin-dependent diabetes mellitus (IDDM) and possibly thought to be involved with the inflammatory process of pancreatic beta-cell destruction during the development of IDDM. This gene is also a candidate gene for the development of rheumatoid arthritis. Two transcript variants encoding the same protein have been found for this gene. 7916 proline rich coiled-coil 2A
ENSG00000109971 HSPA8 This gene encodes a member of the heat shock protein 70 family, which contains both heat-inducible and constitutively expressed members. This protein belongs to the latter group, which are also referred to as heat-shock cognate proteins. It functions as a chaperone, and binds to nascent polypeptides to facilitate correct folding. It also functions as an ATPase in the disassembly of clathrin-coated vesicles during transport of membrane components through the cell. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 3312 heat shock protein family A (Hsp70) member 8
ENSG00000021355 SERPINB1 The protein encoded by this gene is a member of the serpin family of proteinase inhibitors. Members of this family maintain homeostasis by neutralizing overexpressed proteinase activity through their function as suicide substrates. This protein inhibits the neutrophil-derived proteinases neutrophil elastase, cathepsin G, and proteinase-3 and thus protects tissues from damage at inflammatory sites. Alternative splicing results in multiple transcript variants. 1992 serpin family B member 1
ENSG00000075151 EIF4G3 The protein encoded by this gene is thought to be part of the eIF4F protein complex, which is involved in mRNA cap recognition and transport of mRNAs to the ribosome. Interestingly, a microRNA (miR-520c-3p) has been found that negatively regulates synthesis of the encoded protein, and this leads to a global decrease in protein translation and cell proliferation. Therefore, this protein is a key component of the anti-tumor activity of miR-520c-3p. 8672 eukaryotic translation initiation factor 4 gamma 3
ENSG00000115414 FN1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. 2335 fibronectin 1
ENSG00000067225 PKM This gene encodes a protein involved in glycolysis. The encoded protein is a pyruvate kinase that catalyzes the transfer of a phosphoryl group from phosphoenolpyruvate to ADP, generating ATP and pyruvate. This protein has been shown to interact with thyroid hormone and may mediate cellular metabolic effects induced by thyroid hormones. This protein has been found to bind Opa protein, a bacterial outer membrane protein involved in gonococcal adherence to and invasion of human cells, suggesting a role of this protein in bacterial pathogenesis. Several alternatively spliced transcript variants encoding a few distinct isoforms have been reported. 5315 pyruvate kinase, muscle
ENSG00000070371 CLTCL1 This gene is a member of the clathrin heavy chain family and encodes a major protein of the polyhedral coat of coated pits and vesicles. Chromosomal aberrations involving this gene are associated with meningioma, DiGeorge syndrome, and velo-cardio-facial syndrome. Multiple transcript variants encoding different isoforms have been found for this gene. 8218 clathrin heavy chain like 1
ENSG00000134243 SORT1 This gene encodes a member of the VPS10-related sortilin family of proteins. The encoded preproprotein is proteolytically processed by furin to generate the mature receptor. This receptor plays a role in the trafficking of different proteins to either the cell surface, or subcellular compartments such as lysosomes and endosomes. Expression levels of this gene may influence the risk of myocardial infarction in human patients. Alternative splicing results in multiple transcript variants. 6272 sortilin 1
ENSG00000082641 NFE2L1 This gene encodes a protein that is involved in globin gene expression in erythrocytes. Confusion has occurred in bibliographic databases due to the shared symbol of NRF1 for this gene, NFE2L1, and for ‘nuclear respiratory factor 1’ which has an official symbol of NRF1. 4779 nuclear factor, erythroid 2 like 1
ENSG00000158828 PINK1 This gene encodes a serine/threonine protein kinase that localizes to mitochondria. It is thought to protect cells from stress-induced mitochondrial dysfunction. Mutations in this gene cause one form of autosomal recessive early-onset Parkinson disease. 65018 PTEN induced putative kinase 1
ENSG00000188554 NBR1 The protein encoded by this gene was originally identified as an ovarian tumor antigen monitored in ovarian cancer. The encoded protein contains a B-box/coiled-coil motif, which is present in many genes with transformation potential. It functions as a specific autophagy receptor for the selective autophagic degradation of peroxisomes by forming intracellular inclusions with ubiquitylated autophagic substrates. This gene is located on a region of chromosome 17q21.1 that is in close proximity to the BRCA1 tumor suppressor gene. Alternative splicing of this gene results in multiple transcript variants. 4077 NBR1, autophagy cargo receptor
ENSG00000160299 PCNT The protein encoded by this gene binds to calmodulin and is expressed in the centrosome. It is an integral component of the pericentriolar material (PCM). The protein contains a series of coiled-coil domains and a highly conserved PCM targeting motif called the PACT domain near its C-terminus. The protein interacts with the microtubule nucleation component gamma-tubulin and is likely important to normal functioning of the centrosomes, cytoskeleton, and cell-cycle progression. Mutations in this gene cause Seckel syndrome-4 and microcephalic osteodysplastic primordial dwarfism type II. Two transcript variants encoding different isoforms have been found for this gene. 5116 pericentrin
ENSG00000142156 COL6A1 The collagens are a superfamily of proteins that play a role in maintaining the integrity of various tissues. Collagens are extracellular matrix proteins and have a triple-helical domain as their common structural element. Collagen VI is a major structural component of microfibrils. The basic structural unit of collagen VI is a heterotrimer of the alpha1(VI), alpha2(VI), and alpha3(VI) chains. The alpha2(VI) and alpha3(VI) chains are encoded by the COL6A2 and COL6A3 genes, respectively. The protein encoded by this gene is the alpha 1 subunit of type VI collagen (alpha1(VI) chain). Mutations in the genes that code for the collagen VI subunits result in the autosomal dominant disorder, Bethlem myopathy. 1291 collagen type VI alpha 1
ENSG00000141753 IGFBP4 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein binds both insulin-like growth factors (IGFs) I and II and circulates in the plasma in both glycosylated and non-glycosylated forms. Binding of this protein prolongs the half-life of the IGFs and alters their interaction with cell surface receptors. 3487 insulin like growth factor binding protein 4
ENSG00000204592 HLA-E HLA-E belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. HLA-E binds a restricted subset of peptides derived from the leader peptides of other class I molecules. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. 3133 major histocompatibility complex, class I, E
ENSG00000126803 HSPA2 NA 3306 heat shock protein family A (Hsp70) member 2
ENSG00000204463 BAG6 This gene was first characterized as part of a cluster of genes located within the human major histocompatibility complex class III region. This gene encodes a nuclear protein that is cleaved by caspase 3 and is implicated in the control of apoptosis. In addition, the protein forms a complex with E1A binding protein p300 and is required for the acetylation of p53 in response to DNA damage. Multiple transcript variants encoding different isoforms have been found for this gene. 7917 BCL2 associated athanogene 6
ENSG00000131095 GFAP This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. 2670 glial fibrillary acidic protein
ENSG00000115758 ODC1 This gene encodes the rate-limiting enzyme of the polyamine biosynthesis pathway which catalyzes ornithine to putrescine. The activity level for the enzyme varies in response to growth-promoting stimuli and exhibits a high turnover rate in comparison to other mammalian proteins. Originally localized to both chromosomes 2 and 7, the gene encoding this enzyme has been determined to be located on 2p25, with a pseudogene located on 7q31-qter. Multiple alternatively spliced transcript variants encoding distinct isoforms have been identified. 4953 ornithine decarboxylase 1
ENSG00000103994 ZNF106 NA 64397 zinc finger protein 106
ENSG00000089737 DDX24 DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a DEAD box protein, which shows little similarity to any of the other known human DEAD box proteins, but shows a high similarity to mouse Ddx24 at the amino acid level. 57062 DEAD-box helicase 24
ENSG00000018625 ATP1A2 The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The catalytic subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes an alpha 2 subunit. Mutations in this gene result in familial basilar or hemiplegic migraines, and in a rare syndrome known as alternating hemiplegia of childhood. 477 ATPase Na+/K+ transporting subunit alpha 2
ENSG00000234964 FABP5P7 NA ENSG00000234964 fatty acid binding protein 5 pseudogene 7
ENSG00000143248 RGS5 This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. 8490 regulator of G-protein signaling 5
ENSG00000128016 ZFP36 NA 7538 ZFP36 ring finger protein
ENSG00000127481 UBR4 The protein encoded by this gene is an E3 ubiquitin-protein ligase that interacts with the retinoblastoma-associated protein in the nucleus and with calcium-bound calmodulin in the cytoplasm. The encoded protein appears to be a cytoskeletal component in the cytoplasm and part of the chromatin scaffold in the nucleus. In addition, this protein is a target of the human papillomavirus type 16 E7 oncoprotein. 23352 ubiquitin protein ligase E3 component n-recognin 4
ENSG00000115677 HDLBP The protein encoded by this gene binds high density lipoprotein (HDL) and may function to regulate excess cholesterol levels in cells. The encoded protein also binds RNA and can induce heterochromatin formation. 3069 high density lipoprotein binding protein
ENSG00000078618 NRDC This gene encodes a zinc-dependent endopeptidase that cleaves peptide substrates at the N-terminus of arginine residues in dibasic moieties and is a member of the peptidase M16 family. This protein interacts with heparin-binding EGF-like growth factor and plays a role in cell migration and proliferation. Multiple transcript variants encoding different isoforms have been found for this gene. 4898 nardilysin convertase
ENSG00000026508 CD44 The protein encoded by this gene is a cell-surface glycoprotein involved in cell-cell interactions, cell adhesion and migration. It is a receptor for hyaluronic acid (HA) and can also interact with other ligands, such as osteopontin, collagens, and matrix metalloproteinases (MMPs). This protein participates in a wide variety of cellular functions including lymphocyte activation, recirculation and homing, hematopoiesis, and tumor metastasis. Transcripts for this gene undergo complex alternative splicing that results in many functionally distinct isoforms, however, the full length nature of some of these variants has not been determined. Alternative splicing is the basis for the structural and functional diversity of this protein, and may be related to tumor metastasis. 960 CD44 molecule (Indian blood group)
ENSG00000105655 ISYNA1 This gene encodes an inositol-3-phosphate synthase enzyme. The encoded protein plays a critical role in the myo-inositol biosynthesis pathway by catalyzing the rate-limiting conversion of glucose 6-phosphate to myoinositol 1-phosphate. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene, and a pseudogene of this gene is located on the short arm of chromosome 4. 51477 inositol-3-phosphate synthase 1
ENSG00000169710 FASN The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. 2194 fatty acid synthase
ENSG00000110713 NUP98 Nuclear pore complexes (NPCs) regulate the transport of macromolecules between the nucleus and cytoplasm, and are composed of many polypeptide subunits, many of which belong to the nucleoporin family. This gene belongs to the nucleoporin gene family and encodes a 186 kDa precursor protein that undergoes autoproteolytic cleavage to generate a 98 kDa nucleoporin and 96 kDa nucleoporin. The 98 kDa nucleoporin contains a Gly-Leu-Phe-Gly (GLGF) repeat domain and participates in many cellular processes, including nuclear import, nuclear export, mitotic progression, and regulation of gene expression. The 96 kDa nucleoporin is a scaffold component of the NPC. Proteolytic cleavage is important for targeting of the proteins to the NPC. Translocations between this gene and many other partner genes have been observed in different leukemias. Rearrangements typically result in chimeras with the N-terminal GLGF domain of this gene to the C-terminus of the partner gene. Alternative splicing results in multiple transcript variants encoding different isoforms, at least two of which are proteolytically processed. Some variants lack the region that encodes the 96 kDa nucleoporin. 4928 nucleoporin 98
ENSG00000188643 S100A16 NA 140576 S100 calcium binding protein A16
ENSG00000197747 S100A10 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in exocytosis and endocytosis. 6281 S100 calcium binding protein A10
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",1,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 2 Annotations

out <- mygene::queryMany(gene_list[2,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary X_id symbol name query
Protamines substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis, and are the major DNA-binding proteins in the nucleus of sperm in many vertebrates. They package the sperm DNA into a highly condensed complex in a volume less than 5% of a somatic cell nucleus. Many mammalian species have only one protamine (protamine 1); however, a few species, including human and mouse, have two. This gene encodes protamine 2, which is cleaved to give rise to a family of protamine 2 peptides. Alternatively spliced transcript variants have also been found for this gene. 5620 PRM2 protamine 2 ENSG00000122304
NA 5619 PRM1 protamine 1 ENSG00000175646
This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. 2335 FN1 fibronectin 1 ENSG00000115414
Spermatogenesis is a complex process regulated by extracellular and intracellular factors as well as cellular interactions among interstitial cells of the testis, Sertoli cells, and germ cells. This gene is expressed in the testis in Sertoli cells but not germ cells. The protein encoded by this gene contains plant homeodomain (PHD) finger domains, also known as leukemia associated protein (LAP) domains, believed to be involved in transcriptional regulation. The protein, which localizes to the nucleus of transfected cells, has been implicated in the transcriptional regulation of spermatogenesis. Alternate splicing results in multiple transcript variants of this gene. 51533 PHF7 PHD finger protein 7 ENSG00000010318
The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. 3860 KRT13 keratin 13 ENSG00000171401
This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. 4151 MB myoglobin ENSG00000198125
This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. 1674 DES desmin ENSG00000175084
Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. 4625 MYH7 myosin, heavy chain 7, cardiac muscle, beta ENSG00000092054
This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. 7273 TTN titin ENSG00000155657
This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. 2670 GFAP glial fibrillary acidic protein ENSG00000131095
The outer dense fibers are cytoskeletal structures that surround the axoneme in the middle piece and principal piece of the sperm tail. The fibers function in maintaining the elastic structure and recoil of the sperm tail as well as in protecting the tail from shear forces during epididymal transport and ejaculation. Defects in the outer dense fibers lead to abnormal sperm morphology and infertility. This gene encodes one of the major outer dense fiber proteins. Alternative splicing results in multiple transcript variants. The longer transcripts, also known as ‘Cenexins’, encode proteins with a C-terminal extension that are differentially targeted to somatic centrioles and thought to be crucial for the formation of microtubule organizing centers. 4957 ODF2 outer dense fiber of sperm tails 2 ENSG00000136811
NA 81691 LOC81691 exonuclease NEF-sp ENSG00000005189
This gene encodes a lysine-specific histone demethylase that belongs to the jumonji/ARID domain-containing family of histone demethylases. The encoded protein is capable of demethylating tri-, di- and monomethylated lysine 4 of histone H3. This protein plays a role in the transcriptional repression or certain tumor suppressor genes and is upregulated in certain cancer cells. This protein may also play a role in genome stability and DNA repair. Alternate splicing resultsi n multiple transcript variants. 10765 KDM5B lysine demethylase 5B ENSG00000117139
This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. 60 ACTB actin, beta ENSG00000075624
This gene belongs to the ATP-ases associated with diverse cellular activities (AAA+) superfamily. Members of this superfamily form ring-shaped homo-hexamers and have highly conserved ATPase domains that are involved in various processes including DNA replication, protein degradation and reactivation of misfolded proteins. All members of this family hydrolyze ATP through their AAA+ domains and use the energy generated through ATP hydrolysis to exert mechanical force on their substrates. In addition to an AAA+ domain, the protein encoded by this gene contains a C-terminal D2 domain, which is characteristic of the AAA+ subfamily of Caseinolytic peptidases to which this protein belongs. It cooperates with Hsp70 in the disaggregation of protein aggregates. Allelic variants of this gene are associated with 3-methylglutaconic aciduria, which causes cataracts and neutropenia. Alternative splicing results in multiple transcript variants. 81570 CLPB ClpB homolog, mitochondrial AAA ATPase chaperonin ENSG00000162129
This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. 2318 FLNC filamin C ENSG00000128591
The protein encoded by this gene interacts with components of the origin recognition complex (ORC) and regulates the formation of the prereplicative complex. The encoded protein stabilizes the ORC and therefore aids in DNA replication. This protein is required for the G1/S phase transition of the cell cycle. In addition, the encoded protein binds to trimethylated histone H3 in heterochromatin and recruits the ORC and lysine methyltransferases, which help maintain the repressive heterochromatic state. Two transcript variants encoding different isoforms have been found for this gene. 222229 LRWD1 leucine rich repeats and WD repeat domain containing 1 ENSG00000161036
This gene encodes nebulin, a giant protein component of the cytoskeletal matrix that coexists with the thick and thin filaments within the sarcomeres of skeletal muscle. In most vertebrates, nebulin accounts for 3 to 4% of the total myofibrillar protein. The encoded protein contains approximately 30-amino acid long modules that can be classified into 7 types and other repeated modules. Protein isoform sizes vary from 600 to 800 kD due to alternative splicing that is tissue-, species-,and developmental stage-specific. Of the 183 exons in the nebulin gene, at least 43 are alternatively spliced, although exons 143 and 144 are not found in the same transcript. Of the several thousand transcript variants predicted for nebulin, the RefSeq Project has decided to create three representative RefSeq records. Mutations in this gene are associated with recessive nemaline myopathy. 4703 NEB nebulin ENSG00000183091
NA 128229 TSACC TSSK6 activating cochaperone ENSG00000163467
The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. 4878 NPPA natriuretic peptide A ENSG00000175206
The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. 2752 GLUL glutamate-ammonia ligase ENSG00000135821
This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. 2878 GPX3 glutathione peroxidase 3 ENSG00000211445
HLA-B belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon 1 encodes the leader peptide, exon 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Hundreds of HLA-B alleles have been described. 3106 HLA-B major histocompatibility complex, class I, B ENSG00000234745
The protein encoded by this gene is similar to proacrosin binding protein sp32 precursor found in mouse, guinea pig, and pig. This protein is located in the sperm acrosome and is thought to function as a binding protein to proacrosin for packaging and condensation of the acrosin zymogen in the acrosomal matrix. This protein is a member of the cancer/testis family of antigens and it is found to be immunogenic. In normal tissues, this mRNA is expressed only in testis, whereas it is detected in a range of different tumor types such as bladder, breast, lung, liver, and colon. 84519 ACRBP acrosin binding protein ENSG00000111644
NA ENSG00000229732 AC019349.5 NA ENSG00000229732
This gene encodes the largest subunit of RNA polymerase II, the polymerase responsible for synthesizing messenger RNA in eukaryotes. The product of this gene contains a carboxy terminal domain composed of heptapeptide repeats that are essential for polymerase activity. These repeats contain serine and threonine residues that are phosphorylated in actively transcribing RNA polymerase. In addition, this subunit, in combination with several other polymerase subunits, forms the DNA binding domain of the polymerase, a groove in which the DNA template is transcribed into RNA. 5430 POLR2A polymerase (RNA) II subunit A ENSG00000181222
The protein encoded by this gene is a member of the PP2C family of Ser/Thr protein phosphatases. PP2C family members are known to be negative regulators of cell stress response pathways. This phosphatase is found to be responsible for the dephosphorylation of Pre-mRNA splicing factors, which is important for the formation of functional spliceosome. Studies of a similar gene in mice suggested a role of this phosphatase in regulating cell cycle progression. 5496 PPM1G protein phosphatase, Mg2+/Mn2+ dependent 1G ENSG00000115241
This gene is proposed to play a role in cerebral cortical development. Mutations in this gene have been associated with microencephaly, cortical malformations, and mental retardation. Alternative splicing results in multiple transcript variants. 284403 WDR62 WD repeat domain 62 ENSG00000075702
This gene encodes a protein belonging to the glyceraldehyde-3-phosphate dehydrogenase family of enzymes that play an important role in carbohydrate metabolism. Like its somatic cell counterpart, this sperm-specific enzyme functions in a nicotinamide adenine dinucleotide-dependent manner to remove hydrogen and add phosphate to glyceraldehyde 3-phosphate to form 1,3-diphosphoglycerate. During spermiogenesis, this enzyme may play an important role in regulating the switch between different energy-producing pathways, and it is required for sperm motility and male fertility. 26330 GAPDHS glyceraldehyde-3-phosphate dehydrogenase, spermatogenic ENSG00000105679
NA ENSG00000219435 TEX40 testis expressed 40 ENSG00000219435
NA 6707 SPRR3 small proline rich protein 3 ENSG00000163209
NA 64753 CCDC136 coiled-coil domain containing 136 ENSG00000128596
The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. 7139 TNNT2 troponin T2, cardiac type ENSG00000118194
The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. 6280 S100A9 S100 calcium binding protein A9 ENSG00000163220
NA 100129518 LOC100129518 uncharacterized LOC100129518 ENSG00000112096
This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 1. 6648 SOD2 superoxide dismutase 2, mitochondrial ENSG00000112096
This gene encodes a highly conserved cold shock domain protein that has broad nucleic acid binding properties. The encoded protein functions as both a DNA and RNA binding protein and has been implicated in numerous cellular processes including regulation of transcription and translation, pre-mRNA splicing, DNA reparation and mRNA packaging. This protein is also a component of messenger ribonucleoprotein (mRNP) complexes and may have a role in microRNA processing. This protein can be secreted through non-classical pathways and functions as an extracellular mitogen. Aberrant expression of the gene is associated with cancer proliferation in numerous tissues. This gene may be a prognostic marker for poor outcome and drug resistance in certain cancers. Alternate splicing results in multiple transcript variants. Pseudogenes of this gene are found on multiple chromosomes. 4904 YBX1 Y-box binding protein 1 ENSG00000065978
This gene encodes a member of the DnaJ or Hsp40 (heat shock protein 40 kD) family of proteins. DNAJ family members are characterized by a highly conserved amino acid stretch called the ‘J-domain’ and function as one of the two major classes of molecular chaperones involved in a wide range of cellular events, such as protein folding and oligomeric protein complex assembly. The encoded protein is a molecular chaperone that stimulates the ATPase activity of Hsp70 heat-shock proteins in order to promote protein folding and prevent misfolded protein aggregation. Alternative splicing results in multiple transcript variants. 3337 DNAJB1 DnaJ heat shock protein family (Hsp40) member B1 ENSG00000132002
The protein encoded by this gene belongs to the superfamily of small heat-shock proteins containing a conservative alpha-crystallin domain at the C-terminal part of the molecule. The expression of this gene in induced by estrogen in estrogen receptor-positive breast cancer cells, and this protein also functions as a chaperone in association with Bag3, a stimulator of macroautophagy. Thus, this gene appears to be involved in regulation of cell proliferation, apoptosis, and carcinogenesis, and mutations in this gene have been associated with different neuromuscular diseases, including Charcot-Marie-Tooth disease. 26353 HSPB8 heat shock protein family B (small) member 8 ENSG00000152137
This gene encodes an inositol-3-phosphate synthase enzyme. The encoded protein plays a critical role in the myo-inositol biosynthesis pathway by catalyzing the rate-limiting conversion of glucose 6-phosphate to myoinositol 1-phosphate. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene, and a pseudogene of this gene is located on the short arm of chromosome 4. 51477 ISYNA1 inositol-3-phosphate synthase 1 ENSG00000105655
This gene encodes a protein containing a MYND-type zinc finger domain that likely functions in assembly of the dynein motor. Mutations in this gene can cause primary ciliary dyskinesia. This gene is also considered a tumor suppressor gene and is often mutated, deleted, or hypermethylated and silenced in cancer cells. Alternative splicing results in multiple transcript variants. 51364 ZMYND10 zinc finger MYND-type containing 10 ENSG00000004838
This gene encodes a kinesin-like protein that functions as a microtubule-dependent molecular motor. The encoded protein can depolymerize microtubules at the plus end, thereby promoting mitotic chromosome segregation. Alternative splicing results in multiple transcript variants. 11004 KIF2C kinesin family member 2C ENSG00000142945
This gene encodes a member of the F-box protein family, members of which are characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into three classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene contains WD-40 domains, in addition to an F-box motif, so it belongs to the Fbw class. Alternatively spliced transcript variants encoding distinct isoforms have been identified for this gene, however, they were found to be nonsense-mediated mRNA decay (NMD) candidates, hence not represented. 54461 FBXW5 F-box and WD repeat domain containing 5 ENSG00000159069
This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 7169 TPM2 tropomyosin 2 (beta) ENSG00000198467
NA 8404 SPARCL1 SPARC like 1 ENSG00000152583
This gene encodes a major cytoplasmic protein which is the only known constituent common to submembranous plaques of both desmosomes and intermediate junctions. This protein forms distinct complexes with cadherins and desmosomal cadherins and is a member of the catenin family since it contains a distinct repeating amino acid motif called the armadillo repeat. Mutation in this gene has been associated with Naxos disease. Alternative splicing occurs in this gene; however, not all transcripts have been fully described. 3728 JUP junction plakoglobin ENSG00000173801
This gene encodes a member of the alpha tubulin family. Tubulin is a major component of microtubules, which are composed of alpha- and beta-tubulin heterodimers and microtubule-associated proteins in the cytoskeleton. Microtubules maintain cellular structure, function in intracellular transport, and play a role in spindle formation during mitosis. 113457 TUBA3D tubulin alpha 3d ENSG00000075886
The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. 58 ACTA1 actin, alpha 1, skeletal muscle ENSG00000143632
This gene encodes a member of the M14 family of metallocarboxypeptidases. The encoded preproprotein is proteolytically processed to generate the mature peptidase. This peripheral membrane protein cleaves C-terminal amino acid residues and is involved in the biosynthesis of peptide hormones and neurotransmitters, including insulin. This protein may also function independently of its peptidase activity, as a neurotrophic factor that promotes neuronal survival, and as a sorting receptor that binds to regulated secretory pathway proteins, including prohormones. Mutations in this gene are implicated in type 2 diabetes. 1363 CPE carboxypeptidase E ENSG00000109472
NA 84266 ALKBH7 alkB homolog 7 ENSG00000125652
This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. 347 APOD apolipoprotein D ENSG00000189058
Nuclear pore complexes (NPCs) regulate the transport of macromolecules between the nucleus and cytoplasm, and are composed of many polypeptide subunits, many of which belong to the nucleoporin family. This gene belongs to the nucleoporin gene family and encodes a 186 kDa precursor protein that undergoes autoproteolytic cleavage to generate a 98 kDa nucleoporin and 96 kDa nucleoporin. The 98 kDa nucleoporin contains a Gly-Leu-Phe-Gly (GLGF) repeat domain and participates in many cellular processes, including nuclear import, nuclear export, mitotic progression, and regulation of gene expression. The 96 kDa nucleoporin is a scaffold component of the NPC. Proteolytic cleavage is important for targeting of the proteins to the NPC. Translocations between this gene and many other partner genes have been observed in different leukemias. Rearrangements typically result in chimeras with the N-terminal GLGF domain of this gene to the C-terminus of the partner gene. Alternative splicing results in multiple transcript variants encoding different isoforms, at least two of which are proteolytically processed. Some variants lack the region that encodes the 96 kDa nucleoporin. 4928 NUP98 nucleoporin 98 ENSG00000110713
This gene was first characterized as part of a cluster of genes located within the human major histocompatibility complex class III region. This gene encodes a nuclear protein that is cleaved by caspase 3 and is implicated in the control of apoptosis. In addition, the protein forms a complex with E1A binding protein p300 and is required for the acetylation of p53 in response to DNA damage. Multiple transcript variants encoding different isoforms have been found for this gene. 7917 BAG6 BCL2 associated athanogene 6 ENSG00000204463
This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 4604 MYBPC1 myosin binding protein C, slow type ENSG00000196091
This gene belongs to the chemokine-like factor gene superfamily, a novel family that links the chemokine and the transmembrane 4 superfamilies of signaling molecules. The protein encoded by this gene may play an important role in testicular development. 146225 CMTM2 CKLF like MARVEL transmembrane domain containing 2 ENSG00000140932
NA ENSG00000211896 IGHG1 immunoglobulin heavy constant gamma 1 (G1m marker) ENSG00000211896
This gene encodes a member of A-kinase anchoring proteins (AKAPs), a family of functionally related proteins that target protein kinase A to discrete locations within the cell. The encoded protein is reported to participate in protein-protein interactions with the R-subunit of the protein kinase A as well as sperm-associated proteins. This protein is expressed in spermatozoa and localized to the acrosomal region of the sperm head as well as the length of the principal piece. It may function as a regulator of motility, capacitation, and the acrosome reaction. 10566 AKAP3 A-kinase anchoring protein 3 ENSG00000111254
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. As many as six of this type II cytokeratin (KRT6) have been identified; the multiplicity of the genes is attributed to successive gene duplication events. The genes are expressed with family members KRT16 and/or KRT17 in the filiform papillae of the tongue, the stratified epithelial lining of oral mucosa and esophagus, the outer root sheath of hair follicles, and the glandular epithelia. This KRT6 gene in particular encodes the most abundant isoform. Mutations in these genes have been associated with pachyonychia congenita. In addition, peptides from the C-terminal region of the protein have antimicrobial activity against bacterial pathogens. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3853 KRT6A keratin 6A ENSG00000205420
NA ENSG00000242349 NPPA-AS1 NPPA antisense RNA 1 ENSG00000242349
This gene encodes a member of the glutathione peroxidase protein family. Glutathione peroxidase catalyzes the reduction of hydrogen peroxide, organic hydroperoxide, and lipid peroxides by reduced glutathione and functions in the protection of cells against oxidative damage. Human plasma glutathione peroxidase has been shown to be a selenium-containing enzyme and the UGA codon is translated into a selenocysteine. The encoded protein has been identified as a moonlighting protein based on its ability to serve dual functions as a peroxidase as well as a structural protein in mature spermatozoa. Through alternative splicing and transcription initiation, rat produces proteins that localize to the nucleus, mitochondrion, and cytoplasm. In humans, alternative transcription initiation and the cleavage sites of the mitochondrial and nuclear transit peptides need to be experimentally verified. Alternative splicing results in multiple transcript variants. 2879 GPX4 glutathione peroxidase 4 ENSG00000167468
This gene belongs to a highly conserved gene family encoding EPS15 homology (EH) domain-containing proteins. The protein-binding EH domain was first noted in EPS15, a substrate for the epidermal growth factor receptor. The EH domain has been shown to be an important motif in proteins involved in protein-protein interactions and in intracellular sorting. The protein encoded by this gene is thought to play a role in the endocytosis of IGF1 receptors. Alternatively spliced transcript variants have been found for this gene. 10938 EHD1 EH domain containing 1 ENSG00000110047
To reach fertilization competence, spermatozoa undergo a series of morphological and molecular maturational processes, termed capacitation, involving protein tyrosine phosphorylation and increased intracellular calcium. The protein encoded by this gene localizes to the principal piece of the sperm flagellum in association with the fibrous sheath and exhibits calcium-binding when phosphorylated during capacitation. A pseudogene on chromosome 3 has been identified for this gene. Alternatively spliced transcript variants encoding distinct protein isoforms have been found for this gene. 26256 CABYR calcium binding tyrosine phosphorylation regulated ENSG00000154040
Histones are basic nuclear proteins that are responsible for the nucleosome structure of the chromosomal fiber in eukaryotes. Nucleosomes consist of approximately 146 bp of DNA wrapped around a histone octamer composed of pairs of each of the four core histones (H2A, H2B, H3, and H4). The chromatin fiber is further compacted through the interaction of a linker histone, H1, with the DNA between the nucleosomes to form higher order chromatin structures. This gene is located on chromosome 12 and encodes a replication-independent histone that is a variant H2A histone. The protein is divergent at the C-terminus compared to the consensus H2A histone family member. This gene also encodes an antimicrobial peptide with antibacterial and antifungal activity. 55766 H2AFJ H2A histone family member J ENSG00000246705
The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. 27063 ANKRD1 ankyrin repeat domain 1 ENSG00000148677
NA 147011 PROCA1 protein interacting with cyclin A1 ENSG00000167525
This gene encodes the second human homologue of the bacterial RuvB gene. Bacterial RuvB protein is a DNA helicase essential for homologous recombination and DNA double-strand break repair. Functional analysis showed that this gene product has both ATPase and DNA helicase activities. This gene is physically linked to the CGB/LHB gene cluster on chromosome 19q13.3, and is very close (55 nt) to the LHB gene, in the opposite orientation. 10856 RUVBL2 RuvB like AAA ATPase 2 ENSG00000183207
This gene encodes a muscle enzyme involved in glycogenolysis. Highly similar enzymes encoded by different genes are found in liver and brain. Mutations in this gene are associated with McArdle disease (myophosphorylase deficiency), a glycogen storage disease of muscle. Alternative splicing results in multiple transcript variants. 5837 PYGM phosphorylase, glycogen, muscle ENSG00000068976
This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1281 COL3A1 collagen type III alpha 1 chain ENSG00000168542
The import of proteins into the nucleus is a process that involves at least 2 steps. The first is an energy-independent docking of the protein to the nuclear envelope and the second is an energy-dependent translocation through the nuclear pore complex. Imported proteins require a nuclear localization sequence (NLS) which generally consists of a short region of basic amino acids or 2 such regions spaced about 10 amino acids apart. Proteins involved in the first step of nuclear import have been identified in different systems. These include the Xenopus protein importin and its yeast homolog, SRP1 (a suppressor of certain temperature-sensitive mutations of RNA polymerase I in Saccharomyces cerevisiae), which bind to the NLS. KPNA2 protein interacts with the NLSs of DNA helicase Q1 and SV40 T antigen and may be involved in the nuclear transport of proteins. KPNA2 also may play a role in V(D)J recombination. Alternative splicing results in multiple transcript variants. 3838 KPNA2 karyopherin subunit alpha 2 ENSG00000182481
This gene was identified as a gene whose expression can be induced by the tumor necrosis factor alpha (TNF) in umbilical vein endothelial cells. The expression of this gene was shown to be induced by retinoic acid in a cell line expressing a oncogenic version of the retinoic acid receptor alpha fusion protein, which suggested that this gene may be a retinoic acid target gene in acute promyelocytic leukemia. 7127 TNFAIP2 TNF alpha induced protein 2 ENSG00000185215
NA ENSG00000153363 LINC00467 long intergenic non-protein coding RNA 467 ENSG00000153363
The nuclear pore complex is a massive structure that extends across the nuclear envelope, forming a gateway that regulates the flow of macromolecules between the nucleus and the cytoplasm. Nucleoporins are the main components of the nuclear pore complex in eukaryotic cells. This gene is a member of the FG-repeat-containing nucleoporins. The protein encoded by this gene is localized to the cytoplasmic face of the nuclear pore complex where it is required for proper cell cycle progression and nucleocytoplasmic transport. The 3’ portion of this gene forms a fusion gene with the DEK gene on chromosome 6 in a t(6,9) translocation associated with acute myeloid leukemia and myelodysplastic syndrome. Alternative splicing of this gene results in multiple transcript variants encoding different isoforms. 8021 NUP214 nucleoporin 214 ENSG00000126883
The protein encoded by this gene mediates transcriptional control by interaction with the Kruppel-associated box repression domain found in many transcription factors. The protein localizes to the nucleus and is thought to associate with specific chromatin regions. The protein is a member of the tripartite motif family. This tripartite motif includes three zinc-binding domains, a RING, a B-box type 1 and a B-box type 2, and a coiled-coil region. 10155 TRIM28 tripartite motif containing 28 ENSG00000130726
Fibrosin is a lymphokine secreted by activated lymphocytes that induces fibroblast proliferation (Prakash and Robbins, 1998 [PubMed 9809749]). 64319 FBRS fibrosin ENSG00000156860
NA 51155 HN1 hematological and neurological expressed 1 ENSG00000189159
NA 51458 RHCG Rh family C glycoprotein ENSG00000140519
The nuclear pore complex (NPC) is found on the nuclear envelope and forms a gateway that regulates the flow of proteins and RNAs between the cytoplasm and nucleoplasm. The NPC is comprised of approximately 30 distinct proteins collectively known as nucleoporins. Nucleoporins are pore-complex-specific glycoproteins which often have cytoplasmically oriented O-linked N-acetylglucosamine residues and numerous repeats of the pentapeptide sequence XFXFG. However, the nucleoporin protein encoded by this gene does not contain the typical FG repeat sequences found in most vertebrate nucleoporins. This nucleoporin is thought to form part of the scaffold for the central channel of the nuclear pore. 23511 NUP188 nucleoporin 188 ENSG00000095319
The paired immunoglobin-like type 2 receptors consist of highly related activating and inhibitory receptors that are involved in the regulation of many aspects of the immune system. The paired immunoglobulin-like receptor genes are located in a tandem head-to-tail orientation on chromosome 7. This gene encodes the activating member of the receptor pair and contains a truncated cytoplasmic tail relative to its inhibitory counterpart (PILRA), that has a long cytoplasmic tail with immunoreceptor tyrosine-based inhibitory (ITIM) motifs. This gene is thought to have arisen from a duplication of the inhibitory PILRA gene and evolved to acquire its activating function. 29990 PILRB paired immunoglobin-like type 2 receptor beta ENSG00000121716
The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of aminophospholipid-transporting ATPases. The aminophospholipid translocases transport phosphatidylserine and phosphatidylethanolamine from one side of a bilayer to the other. This gene encodes member 3 of phospholipid-transporting ATPase 8B; other members of this protein family are located on chromosomes 1, 15 and 18. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 148229 ATP8B3 ATPase phospholipid transporting 8B3 ENSG00000130270
This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. 8490 RGS5 regulator of G-protein signaling 5 ENSG00000143248
This gene encodes an oncoprotein which is thought to play a role in the phenotypic determination of hemopoetic cells. Translocations between this gene and nucleophosmin have been associated with myelodysplastic syndrome and acute myeloid leukemia. Multiple transcript variants encoding different isoforms have been found for this gene. 4291 MLF1 myeloid leukemia factor 1 ENSG00000178053
NA 60509 AGBL5 ATP/GTP binding protein-like 5 ENSG00000084693
NA 23589 CARHSP1 calcium regulated heat stable protein 1 ENSG00000153048
NA 140576 S100A16 S100 calcium binding protein A16 ENSG00000188643
Cytosolic and membrane-bound forms of glutathione S-transferase are encoded by two distinct supergene families. At present, eight distinct classes of the soluble cytoplasmic mammalian glutathione S-transferases have been identified: alpha, kappa, mu, omega, pi, sigma, theta and zeta. This gene encodes a glutathione S-transferase that belongs to the mu class. The mu class of enzymes functions in the detoxification of electrophilic compounds, including carcinogens, therapeutic drugs, environmental toxins and products of oxidative stress, by conjugation with glutathione. The genes encoding the mu class of enzymes are organized in a gene cluster on chromosome 1p13.3 and are known to be highly polymorphic. These genetic variations can change an individual’s susceptibility to carcinogens and toxins as well as affect the toxicity and efficacy of certain drugs. Mutations of this class mu gene have been linked with a slight increase in a number of cancers, likely due to exposure with environmental toxins. Alternative splicing results in multiple transcript variants. 2947 GSTM3 glutathione S-transferase mu 3 (brain) ENSG00000134202
The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. 2194 FASN fatty acid synthase ENSG00000169710
This gene encodes a member of the beta-transducin protein family. Most proteins of the beta-transducin family are involved in regulatory functions. This protein is possibly involved in some intracellular signaling pathway. This gene is deleted in Williams-Beuren syndrome, a developmental disorder caused by deletion of multiple genes at 7q11.23. 26608 TBL2 transducin (beta)-like 2 ENSG00000106638
NA 54535 CCHCR1 coiled-coil alpha-helical rod protein 1 ENSG00000204536
This gene encodes a conventional non-muscle myosin; this protein should not be confused with the unconventional myosin-9a or 9b (MYO9A or MYO9B). The encoded protein is a myosin IIA heavy chain that contains an IQ domain and a myosin head-like domain which is involved in several important functions, including cytokinesis, cell motility and maintenance of cell shape. Defects in this gene have been associated with non-syndromic sensorineural deafness autosomal dominant type 17, Epstein syndrome, Alport syndrome with macrothrombocytopenia, Sebastian syndrome, Fechtner syndrome and macrothrombocytopenia with progressive sensorineural deafness. 4627 MYH9 myosin, heavy chain 9, non-muscle ENSG00000100345
NA 200172 SLFNL1 schlafen like 1 ENSG00000171790
This gene encodes a member of the F-box protein family which is characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of the ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into 3 classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbxs class. Multiple transcript variants encoding different isoforms have been found for this gene. 26261 FBXO24 F-box protein 24 ENSG00000106336
NA 113177 IZUMO4 IZUMO family member 4 ENSG00000099840
Aminoacyl-tRNA synthetases catalyze the aminoacylation of tRNA by their cognate amino acid. Because of their central role in linking amino acids with nucleotide triplets contained in tRNAs, aminoacyl-tRNA synthetases are thought to be among the first proteins that appeared in evolution. The protein encoded by this gene belongs to class-I aminoacyl-tRNA synthetase family and is located in the class III region of the major histocompatibility complex. 7407 VARS valyl-tRNA synthetase ENSG00000204394
The p70/p80 autoantigen is a nuclear complex consisting of two subunits with molecular masses of approximately 70 and 80 kDa. The complex functions as a single-stranded DNA-dependent ATP-dependent helicase. The complex may be involved in the repair of nonhomologous DNA ends such as that required for double-strand break repair, transposition, and V(D)J recombination. High levels of autoantibodies to p70 and p80 have been found in some patients with systemic lupus erythematosus. 2547 XRCC6 X-ray repair cross complementing 6 ENSG00000196419
The protein encoded by this gene can function as a guanine nucleotide exchange factor (GEF) and may play a role in intracellular signaling and cytoskeleton dynamics at the Golgi apparatus. Polymorphisms in the region of this gene have been found to be associated with spinocerebellar ataxia in some study populations. Alternative splicing results in multiple transcript variants. 25894 PLEKHG4 pleckstrin homology and RhoGEF domain containing G4 ENSG00000196155
SLC6A16 shows structural characteristics of an Na(+)- and Cl(-)-dependent neurotransmitter transporter, including 12 transmembrane (TM) domains, intracellular N and C termini, and large extracellular loops containing multiple N-glycosylation sites. 28968 SLC6A16 solute carrier family 6 member 16 ENSG00000063127
This gene encodes the beta subunit of the mitochondrial trifunctional protein, which catalyzes the last three steps of mitochondrial beta-oxidation of long chain fatty acids. The mitochondrial membrane-bound heterocomplex is composed of four alpha and four beta subunits, with the beta subunit catalyzing the 3-ketoacyl-CoA thiolase activity. The encoded protein can also bind RNA and decreases the stability of some mRNAs. The genes of the alpha and beta subunits of the mitochondrial trifunctional protein are located adjacent to each other in the human genome in a head-to-head orientation. Mutations in this gene result in trifunctional protein deficiency. Alternatively spliced transcript variants encoding different isoforms have been described. 3032 HADHB hydroxyacyl-CoA dehydrogenase/3-ketoacyl-CoA thiolase/enoyl-CoA hydratase (trifunctional protein), beta subunit ENSG00000138029
This gene encodes a member of the F-box protein family which is characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into 3 classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbls class and, in addition to an F-box, contains at least six highly degenerated leucine-rich repeats. This family member plays a role in epigenetic silencing. It nucleates at CpG islands and specifically demethylates both mono- and di-methylated lysine-36 of histone H3. Alternative splicing results in multiple transcript variants. 22992 KDM2A lysine demethylase 2A ENSG00000173120
DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a DEAD box protein, which has an ATPase activity and is a component of the survival of motor neurons (SMN) complex. This protein interacts directly with SMN, the spinal muscular atrophy gene product, and may play a catalytic role in the function of the SMN complex on RNPs. 11218 DDX20 DEAD-box helicase 20 ENSG00000064703
This gene encodes a nebulin like protein that is abundantly expressed in cardiac muscle. The encoded protein binds actin and interacts with thin filaments and Z-line associated proteins in striated muscle. This protein may be involved in cardiac myofibril assembly. A shorter isoform of this protein termed LIM nebulette is expressed in non-muscle cells and may function as a component of focal adhesion complexes. Alternate splicing results in multiple transcript variants. 10529 NEBL nebulette ENSG00000078114
NA 101927055 LOC101927055 uncharacterized LOC101927055 ENSG00000237298
NA 100506866 TTN-AS1 TTN antisense RNA 1 ENSG00000237298
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",2,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 3 Annotations

out <- mygene::queryMany(gene_list[3,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id summary name symbol query notfound
72 Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. actin, gamma 2, smooth muscle, enteric ACTG2 ENSG00000163017 NA
59 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. actin, alpha 2, smooth muscle, aorta ACTA2 ENSG00000107796 NA
7273 This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. titin TTN ENSG00000155657 NA
3043 The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. hemoglobin subunit beta HBB ENSG00000244734 NA
1291 The collagens are a superfamily of proteins that play a role in maintaining the integrity of various tissues. Collagens are extracellular matrix proteins and have a triple-helical domain as their common structural element. Collagen VI is a major structural component of microfibrils. The basic structural unit of collagen VI is a heterotrimer of the alpha1(VI), alpha2(VI), and alpha3(VI) chains. The alpha2(VI) and alpha3(VI) chains are encoded by the COL6A2 and COL6A3 genes, respectively. The protein encoded by this gene is the alpha 1 subunit of type VI collagen (alpha1(VI) chain). Mutations in the genes that code for the collagen VI subunits result in the autosomal dominant disorder, Bethlem myopathy. collagen type VI alpha 1 COL6A1 ENSG00000142156 NA
58 The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. actin, alpha 1, skeletal muscle ACTA1 ENSG00000143632 NA
4629 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. myosin, heavy chain 11, smooth muscle MYH11 ENSG00000133392 NA
ENSG00000180139 NA ACTA2 antisense RNA 1 ACTA2-AS1 ENSG00000180139 NA
4638 This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. myosin light chain kinase MYLK ENSG00000065534 NA
60 This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. actin, beta ACTB ENSG00000075624 NA
5730 The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. prostaglandin D2 synthase PTGDS ENSG00000107317 NA
4155 The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. myelin basic protein MBP ENSG00000197971 NA
4637 Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain that is expressed in smooth muscle and non-muscle tissues. Genomic sequences representing several pseudogenes have been described and two transcript variants encoding different isoforms have been identified for this gene. myosin light chain 6 MYL6 ENSG00000092841 NA
23336 The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene. synemin SYNM ENSG00000182253 NA
87 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, cytoskeletal, alpha actinin isoform and maps to the same site as the structurally similar erythroid beta spectrin gene. Three transcript variants encoding different isoforms have been found for this gene. actinin alpha 1 ACTN1 ENSG00000072110 NA
10398 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. myosin light chain 9 MYL9 ENSG00000101335 NA
7052 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene acts as a monomer, is induced by retinoic acid, and appears to be involved in apoptosis. Finally, the encoded protein is the autoantigen implicated in celiac disease. Two transcript variants encoding different isoforms have been found for this gene. transglutaminase 2 TGM2 ENSG00000198959 NA
NA NA NA NA ENSG00000259716 TRUE
84033 The obscurin gene spans more than 150 kb, contains over 80 exons and encodes a protein of approximately 720 kDa. The encoded protein contains 68 Ig domains, 2 fibronectin domains, 1 calcium/calmodulin-binding domain, 1 RhoGEF domain with an associated PH domain, and 2 serine-threonine kinase domains. This protein belongs to the family of giant sacromeric signaling proteins that includes titin and nebulin, and may have a role in the organization of myofibrils during assembly and may mediate interactions between the sarcoplasmic reticulum and myofibrils. Alternatively spliced transcript variants encoding different isoforms have been identified. obscurin, cytoskeletal calmodulin and titin-interacting RhoGEF OBSCN ENSG00000154358 NA
158471 The protein encoded by this gene belongs to the B-cell CLL/lymphoma 2 and adenovirus E1B 19 kDa interacting family, whose members play roles in many cellular processes including apotosis, cell transformation, and synaptic function. Several functions for this protein have been demonstrated including suppression of Ras homolog family member A activity, which results in reduced stress fiber formation and suppression of oncogenic cellular transformation. A high molecular weight isoform of this protein has also been shown to colocalize with Adaptor protein complex 2, beta-Adaptin and endodermal markers, suggesting an involvement in post-endocytic trafficking. In prostate cancer cells, this gene acts as a tumor suppressor and its expression is regulated by prostate cancer antigen 3, a non-protein coding gene on the opposite DNA strand in an intron of this gene. Prostate cancer antigen 3 regulates levels of this gene through formation of a double-stranded RNA that undergoes adenosine deaminase actin on RNA-dependent adenosine-to-inosine RNA editing. Alternative splicing results in multiple transcript variants. prune homolog 2 PRUNE2 ENSG00000106772 NA
3040 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. hemoglobin subunit alpha 2 HBA2 ENSG00000188536 NA
55679 This gene encodes a member of a small family of focal adhesion proteins which interacts with ILK (integrin-linked kinase), a protein which effects protein-protein interactions with the extraceullar matrix. The encoded protein has five LIM domains, each domain forming two zinc fingers, which permit interactions which regulate cell shape and migration. A pseudogene of this gene is located on chromosome 4. Multiple transcript variants encoding different isoforms have been found for this gene. LIM zinc finger domain containing 2 LIMS2 ENSG00000072163 NA
6876 The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. transgelin TAGLN ENSG00000149591 NA
6319 This gene encodes an enzyme involved in fatty acid biosynthesis, primarily the synthesis of oleic acid. The protein belongs to the fatty acid desaturase family and is an integral membrane protein located in the endoplasmic reticulum. Transcripts of approximately 3.9 and 5.2 kb, differing only by alternative polyadenlyation signals, have been detected. A gene encoding a similar enzyme is located on chromosome 4 and a pseudogene of this gene is located on chromosome 17. stearoyl-CoA desaturase SCD ENSG00000099194 NA
ENSG00000269936 NA NA RP11-394O4.5 ENSG00000269936 NA
11034 The product of this gene belongs to the actin-binding proteins ADF family. This family of proteins is responsible for enhancing the turnover rate of actin in vivo. This gene encodes the actin depolymerizing protein that severs actin filaments (F-actin) and binds to actin monomers (G-actin). Two transcript variants encoding distinct isoforms have been identified for this gene. destrin, actin depolymerizing factor DSTN ENSG00000125868 NA
4633 Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. myosin light chain 2 MYL2 ENSG00000111245 NA
4162 NA melanoma cell adhesion molecule MCAM ENSG00000076706 NA
1465 This gene encodes a member of the cysteine-rich protein (CSRP) family. This gene family includes a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this gene product occurs in proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Alternatively spliced transcript variants have been described. cysteine and glycine rich protein 1 CSRP1 ENSG00000159176 NA
2878 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. glutathione peroxidase 3 GPX3 ENSG00000211445 NA
25959 NA KN motif and ankyrin repeat domains 2 KANK2 ENSG00000197256 NA
7168 This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. tropomyosin 1 (alpha) TPM1 ENSG00000140416 NA
ENSG00000259627 NA NA RP11-244F12.2 ENSG00000259627 NA
2670 This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. glial fibrillary acidic protein GFAP ENSG00000131095 NA
6279 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and as a cytokine. Altered expression of this protein is associated with the disease cystic fibrosis. Multiple transcript variants encoding different isoforms have been found for this gene. S100 calcium binding protein A8 S100A8 ENSG00000143546 NA
88 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. actinin alpha 2 ACTN2 ENSG00000077522 NA
7134 Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z. troponin C1, slow skeletal and cardiac type TNNC1 ENSG00000114854 NA
94274 The protein encoded by this gene belongs to the protein phosphatase 1 (PP1) inhibitor family. This protein is an inhibitor of smooth muscle myosin phosphatase, and has higher inhibitory activity when phosphorylated. Inhibition of myosin phosphatase leads to increased myosin phosphorylation and enhanced smooth muscle contraction. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. protein phosphatase 1 regulatory inhibitor subunit 14A PPP1R14A ENSG00000167641 NA
123 The protein encoded by this gene belongs to the perilipin family, members of which coat intracellular lipid storage droplets. This protein is associated with the lipid globule surface membrane material, and maybe involved in development and maintenance of adipose tissue. However, it is not restricted to adipocytes as previously thought, but is found in a wide range of cultured cell lines, including fibroblasts, endothelial and epithelial cells, and tissues, such as lactating mammary gland, adrenal cortex, Sertoli and Leydig cells, and hepatocytes in alcoholic liver cirrhosis, suggesting that it may serve as a marker of lipid accumulation in diverse cell types and diseases. Alternatively spliced transcript variants have been found for this gene. perilipin 2 PLIN2 ENSG00000147872 NA
6280 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. S100 calcium binding protein A9 S100A9 ENSG00000163220 NA
3911 This gene encodes one of the vertebrate laminin alpha chains. Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. The protein encoded by this gene is the alpha-5 subunit of of laminin-10 (laminin-511), laminin-11 (laminin-521) and laminin-15 (laminin-523). laminin subunit alpha 5 LAMA5 ENSG00000130702 NA
682 The protein encoded by this gene is a plasma membrane protein that is important in spermatogenesis, embryo implantation, neural network formation, and tumor progression. The encoded protein is also a member of the immunoglobulin superfamily. Multiple transcript variants encoding different isoforms have been found for this gene. basigin (Ok blood group) BSG ENSG00000172270 NA
493 The protein encoded by this gene belongs to the family of P-type primary ion transport ATPases characterized by the formation of an aspartyl phosphate intermediate during the reaction cycle. These enzymes remove bivalent calcium ions from eukaryotic cells against very large concentration gradients and play a critical role in intracellular calcium homeostasis. The mammalian plasma membrane calcium ATPase isoforms are encoded by at least four separate genes and the diversity of these enzymes is further increased by alternative splicing of transcripts. The expression of different isoforms and splice variants is regulated in a developmental, tissue- and cell type-specific manner, suggesting that these pumps are functionally adapted to the physiological needs of particular cells and tissues. This gene encodes the plasma membrane calcium ATPase isoform 4. Alternatively spliced transcript variants encoding different isoforms have been identified. ATPase plasma membrane Ca2+ transporting 4 ATP2B4 ENSG00000058668 NA
1277 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. collagen type I alpha 1 COL1A1 ENSG00000108821 NA
4256 The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. matrix Gla protein MGP ENSG00000111341 NA
NA NA NA NA ENSG00000256545 TRUE
1158 The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. creatine kinase, M-type CKM ENSG00000104879 NA
23413 This gene is a member of the neuronal calcium sensor gene family, which encode calcium-binding proteins expressed predominantly in neurons. The protein encoded by this gene regulates G protein-coupled receptor phosphorylation in a calcium-dependent manner and can substitute for calmodulin. The protein is associated with secretory granules and modulates synaptic transmission and synaptic plasticity. Multiple transcript variants encoding different isoforms have been found for this gene. neuronal calcium sensor 1 NCS1 ENSG00000107130 NA
2819 This gene encodes a member of the NAD-dependent glycerol-3-phosphate dehydrogenase family. The encoded protein plays a critical role in carbohydrate and lipid metabolism by catalyzing the reversible conversion of dihydroxyacetone phosphate (DHAP) and reduced nicotine adenine dinucleotide (NADH) to glycerol-3-phosphate (G3P) and NAD+. The encoded cytosolic protein and mitochondrial glycerol-3-phosphate dehydrogenase also form a glycerol phosphate shuttle that facilitates the transfer of reducing equivalents from the cytosol to mitochondria. Mutations in this gene are a cause of transient infantile hypertriglyceridemia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. glycerol-3-phosphate dehydrogenase 1 GPD1 ENSG00000167588 NA
7314 This gene encodes ubiquitin, one of the most conserved proteins known. Ubiquitin has a major role in targeting cellular proteins for degradation by the 26S proteosome. It is also involved in the maintenance of chromatin structure, the regulation of gene expression, and the stress response. Ubiquitin is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin moiety fused to an unrelated protein. This gene consists of three direct repeats of the ubiquitin coding sequence with no spacer sequence. Consequently, the protein is expressed as a polyubiquitin precursor with a final amino acid after the last repeat. An aberrant form of this protein has been detected in patients with Alzheimer’s disease and Down syndrome. Pseudogenes of this gene are located on chromosomes 1, 2, 13, and 17. Alternative splicing results in multiple transcript variants. ubiquitin B UBB ENSG00000170315 NA
8490 This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. regulator of G-protein signaling 5 RGS5 ENSG00000143248 NA
4240 This gene encodes a preproprotein that is proteolytically processed to form multiple protein products. The major encoded protein product, lactadherin, is a membrane glycoprotein that promotes phagocytosis of apoptotic cells. This protein has also been implicated in wound healing, autoimmune disease, and cancer. Lactadherin can be further processed to form a smaller cleavage product, medin, which comprises the major protein component of aortic medial amyloid (AMA). Alternative splicing results in multiple transcript variants. milk fat globule-EGF factor 8 protein MFGE8 ENSG00000140545 NA
5837 This gene encodes a muscle enzyme involved in glycogenolysis. Highly similar enzymes encoded by different genes are found in liver and brain. Mutations in this gene are associated with McArdle disease (myophosphorylase deficiency), a glycogen storage disease of muscle. Alternative splicing results in multiple transcript variants. phosphorylase, glycogen, muscle PYGM ENSG00000068976 NA
23022 This gene encodes a cytoskeletal protein that is required for organizing the actin cytoskeleton. The protein is a component of actin-containing microfilaments, and it is involved in the control of cell shape, adhesion, and contraction. Polymorphisms in this gene are associated with a susceptibility to pancreatic cancer type 1, and also with a risk for myocardial infarction. Alternative splicing results in multiple transcript variants. palladin, cytoskeletal associated protein PALLD ENSG00000129116 NA
1809 NA dihydropyrimidinase like 3 DPYSL3 ENSG00000113657 NA
3860 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. keratin 13 KRT13 ENSG00000171401 NA
3039 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. hemoglobin subunit alpha 1 HBA1 ENSG00000206172 NA
8557 Sarcomere assembly is regulated by the muscle protein titin. Titin is a giant elastic protein with kinase activity that extends half the length of a sarcomere. It serves as a scaffold to which myofibrils and other muscle related proteins are attached. This gene encodes a protein found in striated and cardiac muscle that binds to the titin Z1-Z2 domains and is a substrate of titin kinase, interactions thought to be critical to sarcomere assembly. Mutations in this gene are associated with limb-girdle muscular dystrophy type 2G. titin-cap TCAP ENSG00000173991 NA
25802 The leiomodin 1 protein has a putative membrane-spanning region and 2 types of tandemly repeated blocks. The transcript is expressed in all tissues tested, with the highest levels in thyroid, eye muscle, skeletal muscle, and ovary. Increased expression of leiomodin 1 may be linked to Graves’ disease and thyroid-associated ophthalmopathy. leiomodin 1 LMOD1 ENSG00000163431 NA
9260 The protein encoded by this gene is representative of a family of proteins composed of conserved PDZ and LIM domains. LIM domains are proposed to function in protein-protein recognition in a variety of contexts including gene transcription and development and in cytoskeletal interaction. The LIM domains of this protein bind to protein kinases, whereas the PDZ domain binds to actin filaments. The gene product is involved in the assembly of an actin filament-associated complex essential for transmission of ret/ptc2 mitogenic signaling. The biological function is likely to be that of an adapter, with the PDZ domain localizing the LIM-binding proteins to actin filaments of both skeletal muscle and nonmuscle tissues. Alternative splicing of this gene results in multiple transcript variants. PDZ and LIM domain 7 PDLIM7 ENSG00000196923 NA
5166 This gene is a member of the PDK/BCKDK protein kinase family and encodes a mitochondrial protein with a histidine kinase domain. This protein is located in the matrix of the mitrochondria and inhibits the pyruvate dehydrogenase complex by phosphorylating one of its subunits, thereby contributing to the regulation of glucose metabolism. Expression of this gene is regulated by glucocorticoids, retinoic acid and insulin. pyruvate dehydrogenase kinase 4 PDK4 ENSG00000004799 NA
4624 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. myosin, heavy chain 6, cardiac muscle, alpha MYH6 ENSG00000197616 NA
1281 This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. collagen type III alpha 1 chain COL3A1 ENSG00000168542 NA
7846 Microtubules of the eukaryotic cytoskeleton perform essential and diverse functions and are composed of a heterodimer of alpha and beta tubulins. The genes encoding these microtubule constituents belong to the tubulin superfamily, which is composed of six distinct families. Genes from the alpha, beta and gamma tubulin families are found in all eukaryotes. The alpha and beta tubulins represent the major components of microtubules, while gamma tubulin plays a critical role in the nucleation of microtubule assembly. There are multiple alpha and beta tubulin genes, which are highly conserved among species. This gene encodes alpha tubulin and is highly similar to the mouse and rat Tuba1 genes. Northern blotting studies have shown that the gene expression is predominantly found in morphologically differentiated neurologic cells. This gene is one of three alpha-tubulin genes in a cluster on chromosome 12q. Mutations in this gene cause lissencephaly type 3 (LIS3) - a neurological condition characterized by microcephaly, mental retardation, and early-onset epilepsy and caused by defective neuronal migration. Alternative splicing results in multiple transcript variants encoding distinct isoforms. tubulin alpha 1a TUBA1A ENSG00000167552 NA
7138 This gene encodes a protein that is a subunit of troponin, which is a regulatory complex located on the thin filament of the sarcomere. This complex regulates striated muscle contraction in response to fluctuations in intracellular calcium concentration. This complex is composed of three subunits: troponin C, which binds calcium, troponin T, which binds tropomyosin, and troponin I, which is an inhibitory subunit. This protein is the slow skeletal troponin T subunit. Mutations in this gene cause nemaline myopathy type 5, also known as Amish nemaline myopathy, a neuromuscular disorder characterized by muscle weakness and rod-shaped, or nemaline, inclusions in skeletal muscle fibers which affects infants, resulting in death due to respiratory insufficiency, usually in the second year. Multiple transcript variants encoding different isoforms have been found for this gene. troponin T1, slow skeletal type TNNT1 ENSG00000105048 NA
283120 This gene is located in an imprinted region of chromosome 11 near the insulin-like growth factor 2 (IGF2) gene. This gene is only expressed from the maternally-inherited chromosome, whereas IGF2 is only expressed from the paternally-inherited chromosome. The product of this gene is a long non-coding RNA which functions as a tumor suppressor. Mutations in this gene have been associated with Beckwith-Wiedemann Syndrome and Wilms tumorigenesis. Alternative splicing results in multiple transcript variants. H19, imprinted maternally expressed transcript (non-protein coding) H19 ENSG00000130600 NA
5997 Regulator of G protein signaling (RGS) family members are regulatory molecules that act as GTPase activating proteins (GAPs) for G alpha subunits of heterotrimeric G proteins. RGS proteins are able to deactivate G protein subunits of the Gi alpha, Go alpha and Gq alpha subtypes. They drive G proteins into their inactive GDP-bound forms. Regulator of G protein signaling 2 belongs to this family. The protein acts as a mediator of myeloid differentiation and may play a role in leukemogenesis. regulator of G-protein signaling 2 RGS2 ENSG00000116741 NA
165 This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. AE binding protein 1 AEBP1 ENSG00000106624 NA
283131 This gene produces a long non-coding RNA (lncRNA) transcribed from the multiple endocrine neoplasia locus. This lncRNA is retained in the nucleus where it forms the core structural component of the paraspeckle sub-organelles. It may act as a transcriptional regulator for numerous genes, including some genes involved in cancer progression. nuclear paraspeckle assembly transcript 1 (non-protein coding) NEAT1 ENSG00000245532 NA
7316 This gene represents a ubiquitin gene, ubiquitin C. The encoded protein is a polyubiquitin precursor. Conjugation of ubiquitin monomers or polymers can lead to various effects within a cell, depending on the residues to which ubiquitin is conjugated. Ubiquitination has been associated with protein degradation, DNA repair, cell cycle regulation, kinase modification, endocytosis, and regulation of other cell signaling pathways. ubiquitin C UBC ENSG00000150991 NA
122622 This gene encodes a member of the adenylosuccinate synthase family of proteins. The encoded muscle-specific enzyme plays a role in the purine nucleotide cycle by catalyzing the first step in the conversion of inosine monophosphate (IMP) to adenosine monophosphate (AMP). Mutations in this gene may cause adolescent onset distal myopathy. Alternative splicing results in multiple transcript variants. adenylosuccinate synthase like 1 ADSSL1 ENSG00000185100 NA
ENSG00000266844 NA NA RP11-862L9.3 ENSG00000266844 NA
6525 This gene encodes a structural protein that is found exclusively in contractile smooth muscle cells. It associates with stress fibers and constitutes part of the cytoskeleton. This gene is localized to chromosome 22q12.3, distal to the TUPLE1 locus and outside the DiGeorge syndrome deletion. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. smoothelin SMTN ENSG00000183963 NA
3320 The protein encoded by this gene is an inducible molecular chaperone that functions as a homodimer. The encoded protein aids in the proper folding of specific target proteins by use of an ATPase activity that is modulated by co-chaperones. Two transcript variants encoding different isoforms have been found for this gene. heat shock protein 90kDa alpha family class A member 1 HSP90AA1 ENSG00000080824 NA
761 Carbonic anhydrase III (CAIII) is a member of a multigene family (at least six separate genes are known) that encodes carbonic anhydrase isozymes. These carbonic anhydrases are a class of metalloenzymes that catalyze the reversible hydration of carbon dioxide and are differentially expressed in a number of cell types. The expression of the CA3 gene is strictly tissue specific and present at high levels in skeletal muscle and much lower levels in cardiac and smooth muscle. A proportion of carriers of Duchenne muscle dystrophy have a higher CA3 level than normal. The gene spans 10.3 kb and contains seven exons and six introns. carbonic anhydrase 3 CA3 ENSG00000164879 NA
5950 This protein belongs to the lipocalin family and is the specific carrier for retinol (vitamin A alcohol) in the blood. It delivers retinol from the liver stores to the peripheral tissues. In plasma, the RBP-retinol complex interacts with transthyretin which prevents its loss by filtration through the kidney glomeruli. A deficiency of vitamin A blocks secretion of the binding protein posttranslationally and results in defective delivery and supply to the epidermal cells. retinol binding protein 4 RBP4 ENSG00000138207 NA
10875 The protein encoded by this gene is a secreted protein that is similar to the beta- and gamma-chains of fibrinogen. The carboxyl-terminus of the encoded protein consists of the fibrinogen-related domains (FRED). The encoded protein forms a tetrameric complex which is stabilized by interchain disulfide bonds. This protein may play a role in physiologic functions at mucosal sites. fibrinogen like 2 FGL2 ENSG00000127951 NA
388 NA ras homolog family member B RHOB ENSG00000143878 NA
ENSG00000261054 NA NA RP11-6O2.4 ENSG00000261054 NA
844 This gene encodes the skeletal muscle specific member of the calsequestrin protein family. Calsequestrin functions as a luminal sarcoplasmic reticulum calcium sensor in both cardiac and skeletal muscle cells. This protein, also known as calmitine, functions as a calcium regulator in the mitochondria of skeletal muscle. This protein is absent in patients with Duchenne and Becker types of muscular dystrophy. calsequestrin 1 CASQ1 ENSG00000143318 NA
4625 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. myosin, heavy chain 7, cardiac muscle, beta MYH7 ENSG00000092054 NA
51177 NA pleckstrin homology domain containing O1 PLEKHO1 ENSG00000023902 NA
7450 This gene encodes a glycoprotein involved in hemostasis. The encoded preproprotein is proteolytically processed following assembly into large multimeric complexes. These complexes function in the adhesion of platelets to sites of vascular injury and the transport of various proteins in the blood. Mutations in this gene result in von Willebrand disease, an inherited bleeding disorder. An unprocessed pseudogene has been found on chromosome 22. von Willebrand factor VWF ENSG00000110799 NA
4628 This gene encodes a member of the myosin superfamily. The protein represents a conventional non-muscle myosin; it should not be confused with the unconventional myosin-10 (MYO10). Myosins are actin-dependent motor proteins with diverse functions including regulation of cytokinesis, cell motility, and cell polarity. Mutations in this gene have been associated with May-Hegglin anomaly and developmental defects in brain and heart. Multiple transcript variants encoding different isoforms have been found for this gene. myosin, heavy chain 10, non-muscle MYH10 ENSG00000133026 NA
11030 This gene encodes a member of the RNA recognition motif family of RNA-binding proteins. The RNA recognition motif is between 80-100 amino acids in length and family members contain one to four copies of the motif. The RNA recognition motif consists of two short stretches of conserved sequence, as well as a few highly conserved hydrophobic residues. The encoded protein has a single, putative RNA recognition motif in its N-terminus. Alternative splicing results in multiple transcript variants encoding different isoforms. RNA binding protein with multiple splicing RBPMS ENSG00000157110 NA
2879 This gene encodes a member of the glutathione peroxidase protein family. Glutathione peroxidase catalyzes the reduction of hydrogen peroxide, organic hydroperoxide, and lipid peroxides by reduced glutathione and functions in the protection of cells against oxidative damage. Human plasma glutathione peroxidase has been shown to be a selenium-containing enzyme and the UGA codon is translated into a selenocysteine. The encoded protein has been identified as a moonlighting protein based on its ability to serve dual functions as a peroxidase as well as a structural protein in mature spermatozoa. Through alternative splicing and transcription initiation, rat produces proteins that localize to the nucleus, mitochondrion, and cytoplasm. In humans, alternative transcription initiation and the cleavage sites of the mitochondrial and nuclear transit peptides need to be experimentally verified. Alternative splicing results in multiple transcript variants. glutathione peroxidase 4 GPX4 ENSG00000167468 NA
3851 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 4 KRT4 ENSG00000170477 NA
58529 The protein encoded by this gene is primarily expressed in the skeletal muscle, and belongs to the myozenin family. Members of this family function as calcineurin-interacting proteins that help tether calcineurin to the sarcomere of cardiac and skeletal muscle. They play an important role in modulation of calcineurin signaling. myozenin 1 MYOZ1 ENSG00000177791 NA
5216 This gene encodes a member of the profilin family of small actin-binding proteins. The encoded protein plays an important role in actin dynamics by regulating actin polymerization in response to extracellular signals. Deletion of this gene is associated with Miller-Dieker syndrome, and the encoded protein may also play a role in Huntington disease. Multiple pseudogenes of this gene are located on chromosome 1. profilin 1 PFN1 ENSG00000108518 NA
51559 NA 5’-nucleotidase domain containing 3 NT5DC3 ENSG00000111696 NA
5662 This gene encodes a Plekstrin homology and SEC7 domains-containing protein that functions as a guanine nucleotide exchange factor. The encoded protein regulates signal transduction by activating ADP-ribosylation factor 6. Alternative splicing results in multiple transcript variants. pleckstrin and Sec7 domain containing PSD ENSG00000059915 NA
2027 This gene encodes one of the three enolase isoenzymes found in mammals. This isoenzyme is found in skeletal muscle cells in the adult where it may play a role in muscle development and regeneration. A switch from alpha enolase to beta enolase occurs in muscle tissue during development in rodents. Mutations in this gene have be associated glycogen storage disease. Alternatively spliced transcript variants encoding different isoforms have been described. enolase 3 ENO3 ENSG00000108515 NA
2192 Fibulin 1 is a secreted glycoprotein that becomes incorporated into a fibrillar extracellular matrix. Calcium-binding is apparently required to mediate its binding to laminin and nidogen. It mediates platelet adhesion via binding fibrinogen. Four splice variants which differ in the 3’ end have been identified. Each variant encodes a different isoform, but no functional distinctions have been identified among the four variants. fibulin 1 FBLN1 ENSG00000077942 NA
79026 NA AHNAK nucleoprotein AHNAK ENSG00000124942 NA
3858 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. keratin 10 KRT10 ENSG00000186395 NA
111 This gene encodes a member of the membrane-bound adenylyl cyclase enzymes. Adenylyl cyclases mediate G protein-coupled receptor signaling through the synthesis of the second messenger cAMP. Activity of the encoded protein is stimulated by the Gs alpha subunit of G protein-coupled receptors and is inhibited by protein kinase A, calcium and Gi alpha subunits. Single nucleotide polymorphisms in this gene may be associated with low birth weight and type 2 diabetes. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. adenylate cyclase 5 ADCY5 ENSG00000173175 NA
219 This protein belongs to the aldehyde dehydrogenases family of proteins. Aldehyde dehydrogenase is the second enzyme of the major oxidative pathway of alcohol metabolism. This gene does not contain introns in the coding sequence. The variation of this locus may affect the development of alcohol-related problems. aldehyde dehydrogenase 1 family member B1 ALDH1B1 ENSG00000137124 NA
2194 The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. fatty acid synthase FASN ENSG00000169710 NA
1917 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 2) is expressed in brain, heart and skeletal muscle, and the other isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas. This gene may be critical in the development of ovarian cancer. eukaryotic translation elongation factor 1 alpha 2 EEF1A2 ENSG00000101210 NA
5364 NA plexin B1 PLXNB1 ENSG00000164050 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",3,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 4 Annotations

out <- mygene::queryMany(gene_list[4,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name summary X_id query symbol notfound
collagen type VI alpha 3 chain This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. 1293 ENSG00000163359 COL6A3 NA
collagen type XII alpha 1 chain This gene encodes the alpha chain of type XII collagen, a member of the FACIT (fibril-associated collagens with interrupted triple helices) collagen family. Type XII collagen is a homotrimer found in association with type I collagen, an association that is thought to modify the interactions between collagen I fibrils and the surrounding matrix. Alternatively spliced transcript variants encoding different isoforms have been identified. 1303 ENSG00000111799 COL12A1 NA
alpha-2-macroglobulin Alpha-2-macroglobulin is a protease inhibitor and cytokine transporter. It inhibits many proteases, including trypsin, thrombin and collagenase. A2M is implicated in Alzheimer disease (AD) due to its ability to mediate the clearance and degradation of A-beta, the major component of beta-amyloid deposits. 2 ENSG00000175899 A2M NA
fatty acid binding protein 4 FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. 2167 ENSG00000170323 FABP4 NA
perilipin 1 The protein encoded by this gene coats lipid storage droplets in adipocytes, thereby protecting them until they can be broken down by hormone-sensitive lipase. The encoded protein is the major cAMP-dependent protein kinase substrate in adipocytes and, when unphosphorylated, may play a role in the inhibition of lipolysis. Alternatively spliced transcript variants varying in the 5’ UTR, but encoding the same protein, have been found for this gene. 5346 ENSG00000166819 PLIN1 NA
membrane metallo-endopeptidase This gene encodes a common acute lymphocytic leukemia antigen that is an important cell surface marker in the diagnosis of human acute lymphocytic leukemia (ALL). This protein is present on leukemic cells of pre-B phenotype, which represent 85% of cases of ALL. This protein is not restricted to leukemic cells, however, and is found on a variety of normal tissues. It is a glycoprotein that is particularly abundant in kidney, where it is present on the brush border of proximal tubules and on glomerular epithelium. The protein is a neutral endopeptidase that cleaves peptides at the amino side of hydrophobic residues and inactivates several peptide hormones including glucagon, enkephalins, substance P, neurotensin, oxytocin, and bradykinin. This gene, which encodes a 100-kD type II transmembrane glycoprotein, exists in a single copy of greater than 45 kb. The 5’ untranslated region of this gene is alternatively spliced, resulting in four separate mRNA transcripts. The coding region is not affected by alternative splicing. 4311 ENSG00000196549 MME NA
fatty acid synthase The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. 2194 ENSG00000169710 FASN NA
stearoyl-CoA desaturase This gene encodes an enzyme involved in fatty acid biosynthesis, primarily the synthesis of oleic acid. The protein belongs to the fatty acid desaturase family and is an integral membrane protein located in the endoplasmic reticulum. Transcripts of approximately 3.9 and 5.2 kb, differing only by alternative polyadenlyation signals, have been detected. A gene encoding a similar enzyme is located on chromosome 4 and a pseudogene of this gene is located on chromosome 17. 6319 ENSG00000099194 SCD NA
gremlin 1, DAN family BMP antagonist This gene encodes a member of the BMP (bone morphogenic protein) antagonist family. Like BMPs, BMP antagonists contain cystine knots and typically form homo- and heterodimers. The CAN (cerberus and dan) subfamily of BMP antagonists, to which this gene belongs, is characterized by a C-terminal cystine knot with an eight-membered ring. The antagonistic effect of the secreted glycosylated protein encoded by this gene is likely due to its direct binding to BMP proteins. As an antagonist of BMP, this gene may play a role in regulating organogenesis, body patterning, and tissue differentiation. In mouse, this protein has been shown to relay the sonic hedgehog (SHH) signal from the polarizing region to the apical ectodermal ridge during limb bud outgrowth. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 26585 ENSG00000166923 GREM1 NA
CD248 molecule NA 57124 ENSG00000174807 CD248 NA
matrix Gla protein The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. 4256 ENSG00000111341 MGP NA
thrombospondin 1 The protein encoded by this gene is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein can bind to fibrinogen, fibronectin, laminin, type V collagen and integrins alpha-V/beta-1. This protein has been shown to play roles in platelet aggregation, angiogenesis, and tumorigenesis. 7057 ENSG00000137801 THBS1 NA
insulin like growth factor binding protein 4 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein binds both insulin-like growth factors (IGFs) I and II and circulates in the plasma in both glycosylated and non-glycosylated forms. Binding of this protein prolongs the half-life of the IGFs and alters their interaction with cell surface receptors. 3487 ENSG00000141753 IGFBP4 NA
retinol binding protein 4 This protein belongs to the lipocalin family and is the specific carrier for retinol (vitamin A alcohol) in the blood. It delivers retinol from the liver stores to the peripheral tissues. In plasma, the RBP-retinol complex interacts with transthyretin which prevents its loss by filtration through the kidney glomeruli. A deficiency of vitamin A blocks secretion of the binding protein posttranslationally and results in defective delivery and supply to the epidermal cells. 5950 ENSG00000138207 RBP4 NA
perilipin 4 Members of the perilipin family, such as PLIN4, coat intracellular lipid storage droplets (Wolins et al., 2003 [PubMed 12840023]). 729359 ENSG00000167676 PLIN4 NA
desmoplakin This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants. 1832 ENSG00000096696 DSP NA
LDL receptor related protein 1 This gene encodes a member of the low-density lipoprotein receptor family of proteins. The encoded preproprotein is proteolytically processed by furin to generate 515 kDa and 85 kDa subunits that form the mature receptor (PMID: 8546712). This receptor is involved in several cellular processes, including intracellular signaling, lipid homeostasis, and clearance of apoptotic cells. In addition, the encoded protein is necessary for the alpha 2-macroglobulin-mediated clearance of secreted amyloid precursor protein and beta-amyloid, the main component of amyloid plaques found in Alzheimer patients. Expression of this gene decreases with age and has been found to be lower than controls in brain tissue from Alzheimer’s disease patients. 4035 ENSG00000123384 LRP1 NA
complement component 3 Complement component C3 plays a central role in the activation of complement system. Its activation is required for both classical and alternative complement activation pathways. The encoded preproprotein is proteolytically processed to generate alpha and beta subunits that form the mature protein, which is then further processed to generate numerous peptide products. The C3a peptide, also known as the C3a anaphylatoxin, modulates inflammation and possesses antimicrobial activity. Mutations in this gene are associated with atypical hemolytic uremic syndrome and age-related macular degeneration in human patients. 718 ENSG00000125730 C3 NA
HOP homeobox The protein encoded by this gene is a homeodomain protein that lacks certain conserved residues required for DNA binding. It was reported that choriocarcinoma cell lines and tissues failed to express this gene, which suggested the possible involvement of this gene in malignant conversion of placental trophoblasts. Studies in mice suggest that this protein may interact with serum response factor (SRF) and modulate SRF-dependent cardiac-specific gene expression and cardiac development. Multiple alternatively spliced transcript variants have been identified for this gene. 84525 ENSG00000171476 HOPX NA
decorin This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. 1634 ENSG00000011465 DCN NA
glycerol-3-phosphate acyltransferase, mitochondrial This gene encodes a mitochondrial enzyme which prefers saturated fatty acids as its substrate for the synthesis of glycerolipids. This metabolic pathway’s first step is catalyzed by the encoded enzyme. Two forms for this enzyme exist, one in the mitochondria and one in the endoplasmic reticulum. Two alternatively spliced transcript variants have been described for this gene. 57678 ENSG00000119927 GPAM NA
NA NA NA ENSG00000256545 NA TRUE
collagen type I alpha 1 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1277 ENSG00000108821 COL1A1 NA
nuclear receptor subfamily 4 group A member 1 This gene encodes a member of the steroid-thyroid hormone-retinoid receptor superfamily. Expression is induced by phytohemagglutinin in human lymphocytes and by serum stimulation of arrested fibroblasts. The encoded protein acts as a nuclear transcription factor. Translocation of the protein from the nucleus to mitochondria induces apoptosis. Multiple transcript variants encoding different isoforms have been found for this gene. 3164 ENSG00000123358 NR4A1 NA
keratin 14 This gene encodes a member of the keratin family, the most diverse group of intermediate filaments. This gene product, a type I keratin, is usually found as a heterotetramer with two keratin 5 molecules, a type II keratin. Together they form the cytoskeleton of epithelial cells. Mutations in the genes for these keratins are associated with epidermolysis bullosa simplex. At least one pseudogene has been identified at 17p12-p11. 3861 ENSG00000186847 KRT14 NA
acetyl-CoA carboxylase beta Acetyl-CoA carboxylase (ACC) is a complex multifunctional enzyme system. ACC is a biotin-containing enzyme which catalyzes the carboxylation of acetyl-CoA to malonyl-CoA, the rate-limiting step in fatty acid synthesis. ACC-beta is thought to control fatty acid oxidation by means of the ability of malonyl-CoA to inhibit carnitine-palmitoyl-CoA transferase I, the rate-limiting step in fatty acid uptake and oxidation by mitochondria. ACC-beta may be involved in the regulation of fatty acid oxidation, rather than fatty acid biosynthesis. There is evidence for the presence of two ACC-beta isoforms. 32 ENSG00000076555 ACACB NA
serine peptidase inhibitor, Kunitz type, 2 This gene encodes a transmembrane protein with two extracellular Kunitz domains that inhibits a variety of serine proteases. The protein inhibits HGF activator which prevents the formation of active hepatocyte growth factor. This gene is a putative tumor suppressor, and mutations in this gene result in congenital sodium diarrhea. Multiple transcript variants encoding different isoforms have been found for this gene. 10653 ENSG00000167642 SPINT2 NA
follistatin like 1 This gene encodes a protein with similarity to follistatin, an activin-binding protein. It contains an FS module, a follistatin-like sequence containing 10 conserved cysteine residues. This gene product is thought to be an autoantigen associated with rheumatoid arthritis. 11167 ENSG00000163430 FSTL1 NA
eukaryotic translation elongation factor 1 alpha 1 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas, and the other isoform (alpha 2) is expressed in brain, heart and skeletal muscle. This isoform is identified as an autoantigen in 66% of patients with Felty syndrome. This gene has been found to have multiple copies on many chromosomes, some of which, if not all, represent different pseudogenes. 1915 ENSG00000156508 EEF1A1 NA
clusterin The protein encoded by this gene is a secreted chaperone that can under some stress conditions also be found in the cell cytosol. It has been suggested to be involved in several basic biological events such as cell death, tumor progression, and neurodegenerative disorders. Alternate splicing results in both coding and non-coding variants. 1191 ENSG00000120885 CLU NA
collagen type VI alpha 1 The collagens are a superfamily of proteins that play a role in maintaining the integrity of various tissues. Collagens are extracellular matrix proteins and have a triple-helical domain as their common structural element. Collagen VI is a major structural component of microfibrils. The basic structural unit of collagen VI is a heterotrimer of the alpha1(VI), alpha2(VI), and alpha3(VI) chains. The alpha2(VI) and alpha3(VI) chains are encoded by the COL6A2 and COL6A3 genes, respectively. The protein encoded by this gene is the alpha 1 subunit of type VI collagen (alpha1(VI) chain). Mutations in the genes that code for the collagen VI subunits result in the autosomal dominant disorder, Bethlem myopathy. 1291 ENSG00000142156 COL6A1 NA
complement component 1, s subcomponent This gene encodes a serine protease, which is a major constituent of the human complement subcomponent C1. C1s associates with two other complement components C1r and C1q in order to yield the first component of the serum complement system. Defects in this gene are the cause of selective C1s deficiency. 716 ENSG00000182326 C1S NA
NA NA NA ENSG00000117289 NA TRUE
matrix metallopeptidase 2 This gene is a member of the matrix metalloproteinase (MMP) gene family, that are zinc-dependent enzymes capable of cleaving components of the extracellular matrix and molecules involved in signal transduction. The protein encoded by this gene is a gelatinase A, type IV collagenase, that contains three fibronectin type II repeats in its catalytic site that allow binding of denatured type IV and V collagen and elastin. Unlike most MMP family members, activation of this protein can occur on the cell membrane. This enzyme can be activated extracellularly by proteases, or, intracellulary by its S-glutathiolation with no requirement for proteolytical removal of the pro-domain. This protein is thought to be involved in multiple pathways including roles in the nervous system, endometrial menstrual breakdown, regulation of vascularization, and metastasis. Mutations in this gene have been associated with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. 4313 ENSG00000087245 MMP2 NA
lipase E, hormone sensitive type The protein encoded by this gene has a long and a short form, generated by use of alternative translational start codons. The long form is expressed in steroidogenic tissues such as testis, where it converts cholesteryl esters to free cholesterol for steroid hormone production. The short form is expressed in adipose tissue, among others, where it hydrolyzes stored triglycerides to free fatty acids. 3991 ENSG00000079435 LIPE NA
junction plakoglobin This gene encodes a major cytoplasmic protein which is the only known constituent common to submembranous plaques of both desmosomes and intermediate junctions. This protein forms distinct complexes with cadherins and desmosomal cadherins and is a member of the catenin family since it contains a distinct repeating amino acid motif called the armadillo repeat. Mutation in this gene has been associated with Naxos disease. Alternative splicing occurs in this gene; however, not all transcripts have been fully described. 3728 ENSG00000173801 JUP NA
keratin 13 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. 3860 ENSG00000171401 KRT13 NA
ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 2 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol into the sarcoplasmic reticulum lumen, and is involved in regulation of the contraction/relaxation cycle. Mutations in this gene cause Darier-White disease, also known as keratosis follicularis, an autosomal dominant skin disorder characterized by loss of adhesion between epidermal cells and abnormal keratinization. Alternative splicing results in multiple transcript variants encoding different isoforms. 488 ENSG00000174437 ATP2A2 NA
heat shock protein family B (small) member 7 NA 27129 ENSG00000173641 HSPB7 NA
carboxypeptidase A1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. 1357 ENSG00000091704 CPA1 NA
actin, alpha 2, smooth muscle, aorta The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. 59 ENSG00000107796 ACTA2 NA
transforming growth factor beta receptor 2 This gene encodes a member of the Ser/Thr protein kinase family and the TGFB receptor subfamily. The encoded protein is a transmembrane protein that has a protein kinase domain, forms a heterodimeric complex with another receptor protein, and binds TGF-beta. This receptor/ligand complex phosphorylates proteins, which then enter the nucleus and regulate the transcription of a subset of genes related to cell proliferation. Mutations in this gene have been associated with Marfan Syndrome, Loeys-Deitz Aortic Aneurysm Syndrome, and the development of various types of tumors. Alternatively spliced transcript variants encoding different isoforms have been characterized. 7048 ENSG00000163513 TGFBR2 NA
glutamate-ammonia ligase The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. 2752 ENSG00000135821 GLUL NA
aspartate beta-hydroxylase This gene is thought to play an important role in calcium homeostasis. The gene is expressed from two promoters and undergoes extensive alternative splicing. The encoded set of proteins share varying amounts of overlap near their N-termini but have substantial variations in their C-terminal domains resulting in distinct functional properties. The longest isoforms (a and f) include a C-terminal Aspartyl/Asparaginyl beta-hydroxylase domain that hydroxylates aspartic acid or asparagine residues in the epidermal growth factor (EGF)-like domains of some proteins, including protein C, coagulation factors VII, IX, and X, and the complement factors C1R and C1S. Other isoforms differ primarily in the C-terminal sequence and lack the hydroxylase domain, and some have been localized to the endoplasmic and sarcoplasmic reticulum. Some of these isoforms are found in complexes with calsequestrin, triadin, and the ryanodine receptor, and have been shown to regulate calcium release from the sarcoplasmic reticulum. Some isoforms have been implicated in metastasis. 444 ENSG00000198363 ASPH NA
cathepsin K The protein encoded by this gene is a lysosomal cysteine proteinase involved in bone remodeling and resorption. This protein, which is a member of the peptidase C1 protein family, is predominantly expressed in osteoclasts. However, the encoded protein is also expressed in a significant fraction of human breast cancers, where it could contribute to tumor invasiveness. Mutations in this gene are the cause of pycnodysostosis, an autosomal recessive disease characterized by osteosclerosis and short stature. 1513 ENSG00000143387 CTSK NA
protease, serine 1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. 5644 ENSG00000204983 PRSS1 NA
microsomal glutathione S-transferase 1 The MAPEG (Membrane Associated Proteins in Eicosanoid and Glutathione metabolism) family consists of six human proteins, two of which are involved in the production of leukotrienes and prostaglandin E, important mediators of inflammation. Other family members, demonstrating glutathione S-transferase and peroxidase activities, are involved in cellular defense against toxic, carcinogenic, and pharmacologically active electrophilic compounds. This gene encodes a protein that catalyzes the conjugation of glutathione to electrophiles and the reduction of lipid hydroperoxides. This protein is localized to the endoplasmic reticulum and outer mitochondrial membrane where it is thought to protect these membranes from oxidative stress. Several transcript variants, some non-protein coding and some protein coding, have been found for this gene. 4257 ENSG00000008394 MGST1 NA
keratin 1 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3848 ENSG00000167768 KRT1 NA
insulin like growth factor binding protein 3 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein forms a ternary complex with insulin-like growth factor acid-labile subunit (IGFALS) and either insulin-like growth factor (IGF) I or II. In this form, it circulates in the plasma, prolonging the half-life of IGFs and altering their interaction with cell surface receptors. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. 3486 ENSG00000146674 IGFBP3 NA
CD36 molecule The protein encoded by this gene is the fourth major glycoprotein of the platelet surface and serves as a receptor for thrombospondin in platelets and various cell lines. Since thrombospondins are widely distributed proteins involved in a variety of adhesive processes, this protein may have important functions as a cell adhesion molecule. It binds to collagen, thrombospondin, anionic phospholipids and oxidized LDL. It directly mediates cytoadherence of Plasmodium falciparum parasitized erythrocytes and it binds long chain fatty acids and may function in the transport and/or as a regulator of fatty acid transport. Mutations in this gene cause platelet glycoprotein deficiency. Multiple alternatively spliced transcript variants have been found for this gene. 948 ENSG00000135218 CD36 NA
adrenomedullin The protein encoded by this gene is a preprohormone which is cleaved to form two biologically active peptides, adrenomedullin and proadrenomedullin N-terminal 20 peptide. Adrenomedullin is a 52 aa peptide with several functions, including vasodilation, regulation of hormone secretion, promotion of angiogenesis, and antimicrobial activity. The antimicrobial activity is antibacterial, as the peptide has been shown to kill E. coli and S. aureus at low concentration. 133 ENSG00000148926 ADM NA
keratin 10 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. 3858 ENSG00000186395 KRT10 NA
integrin subunit alpha 8 Integrins are heterodimeric transmembrane receptor proteins that mediate numerous cellular processes including cell adhesion, cytoskeletal rearrangement, and activation of cell signaling pathways. Integrins are composed of alpha and beta subunits. This gene encodes the alpha 8 subunit of the heterodimeric integrin alpha8beta1 protein. The encoded protein is a single-pass type 1 membrane protein that contains multiple FG-GAP repeats. This repeat is predicted to fold into a beta propeller structure. This gene regulates the recruitment of mesenchymal cells into epithelial structures, mediates cell-cell interactions, and regulates neurite outgrowth of sensory and motor neurons. The integrin alpha8beta1 protein thus plays an important role in wound-healing and organogenesis. Mutations in this gene have been associated with renal hypodysplasia/aplasia-1 (RHDA1) and with several animal models of chronic kidney disease. Alternate splicing results in multiple transcript variants encoding distinct isoforms. 8516 ENSG00000077943 ITGA8 NA
phosphoenolpyruvate carboxykinase 1 This gene is a main control point for the regulation of gluconeogenesis. The cytosolic enzyme encoded by this gene, along with GTP, catalyzes the formation of phosphoenolpyruvate from oxaloacetate, with the release of carbon dioxide and GDP. The expression of this gene can be regulated by insulin, glucocorticoids, glucagon, cAMP, and diet. Defects in this gene are a cause of cytosolic phosphoenolpyruvate carboxykinase deficiency. A mitochondrial isozyme of the encoded protein also has been characterized. 5105 ENSG00000124253 PCK1 NA
interleukin 6 signal transducer The protein encoded by this gene is a signal transducer shared by many cytokines, including interleukin 6 (IL6), ciliary neurotrophic factor (CNTF), leukemia inhibitory factor (LIF), and oncostatin M (OSM). This protein functions as a part of the cytokine receptor complex. The activation of this protein is dependent upon the binding of cytokines to their receptors. vIL6, a protein related to IL6 and encoded by the Kaposi sarcoma-associated herpesvirus, can bypass the interleukin 6 receptor (IL6R) and directly activate this protein. Knockout studies in mice suggest that this gene plays a critical role in regulating myocyte apoptosis. Alternatively spliced transcript variants have been described. A related pseudogene has been identified on chromosome 17. 3572 ENSG00000134352 IL6ST NA
caldesmon 1 This gene encodes a calmodulin- and actin-binding protein that plays an essential role in the regulation of smooth muscle and nonmuscle contraction. The conserved domain of this protein possesses the binding activities to Ca(2+)-calmodulin, actin, tropomyosin, myosin, and phospholipids. This protein is a potent inhibitor of the actin-tropomyosin activated myosin MgATPase, and serves as a mediating factor for Ca(2+)-dependent inhibition of smooth muscle contraction. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. 800 ENSG00000122786 CALD1 NA
glyceraldehyde-3-phosphate dehydrogenase This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. 2597 ENSG00000111640 GAPDH NA
surfactant protein B This gene encodes the pulmonary-associated surfactant protein B (SPB), an amphipathic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. The SPB enhances the rate of spreading and increases the stability of surfactant monolayers in vitro. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 1, also called pulmonary alveolar proteinosis due to surfactant protein B deficiency, and are associated with fatal respiratory distress in the neonatal period. Alternatively spliced transcript variants encoding the same protein have been identified. 6439 ENSG00000168878 SFTPB NA
transforming growth factor beta induced This gene encodes an RGD-containing protein that binds to type I, II and IV collagens. The RGD motif is found in many extracellular matrix proteins modulating cell adhesion and serves as a ligand recognition sequence for several integrins. This protein plays a role in cell-collagen interactions and may be involved in endochondrial bone formation in cartilage. The protein is induced by transforming growth factor-beta and acts to inhibit cell adhesion. Mutations in this gene are associated with multiple types of corneal dystrophy. 7045 ENSG00000120708 TGFBI NA
serum amyloid A1 This gene encodes a member of the serum amyloid A family of apolipoproteins. The encoded preproprotein is proteolytically processed to generate the mature protein. This protein is a major acute phase protein that is highly expressed in response to inflammation and tissue injury. This protein also plays an important role in HDL metabolism and cholesterol homeostasis. High levels of this protein are associated with chronic inflammatory diseases including atherosclerosis, rheumatoid arthritis, Alzheimer’s disease and Crohn’s disease. This protein may also be a potential biomarker for certain tumors. Alternate splicing results in multiple transcript variants that encode the same protein. A pseudogene of this gene is found on chromosome 11. 6288 ENSG00000173432 SAA1 NA
pancreatic lipase This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. 5406 ENSG00000175535 PNLIP NA
discoidin domain receptor tyrosine kinase 2 Receptor tyrosine kinases (RTKs) play a key role in the communication of cells with their microenvironment. These molecules are involved in the regulation of cell growth, differentiation, and metabolism. In several cases the biochemical mechanism by which RTKs transduce signals across the membrane has been shown to be ligand induced receptor oligomerization and subsequent intracellular phosphorylation. This autophosphorylation leads to phosphorylation of cytosolic targets as well as association with other molecules, which are involved in pleiotropic effects of signal transduction. RTKs have a tripartite structure with extracellular, transmembrane, and cytoplasmic regions. This gene encodes a member of a novel subclass of RTKs and contains a distinct extracellular region encompassing a factor VIII-like domain. Alternative splicing in the 5’ UTR results in multiple transcript variants encoding the same protein. 4921 ENSG00000162733 DDR2 NA
eukaryotic translation elongation factor 1 alpha 1 pseudogene 5 NA ENSG00000196205 ENSG00000196205 EEF1A1P5 NA
transforming growth factor beta receptor 3 This locus encodes the transforming growth factor (TGF)-beta type III receptor. The encoded receptor is a membrane proteoglycan that often functions as a co-receptor with other TGF-beta receptor superfamily members. Ectodomain shedding produces soluble TGFBR3, which may inhibit TGFB signaling. Decreased expression of this receptor has been observed in various cancers. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. 7049 ENSG00000069702 TGFBR3 NA
apolipoprotein D This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. 347 ENSG00000189058 APOD NA
major histocompatibility complex, class I, B HLA-B belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon 1 encodes the leader peptide, exon 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Hundreds of HLA-B alleles have been described. 3106 ENSG00000234745 HLA-B NA
DAB2, clathrin adaptor protein This gene encodes a mitogen-responsive phosphoprotein. It is expressed in normal ovarian epithelial cells, but is down-regulated or absent from ovarian carcinoma cell lines, suggesting its role as a tumor suppressor. This protein binds to the SH3 domains of GRB2, an adaptor protein that couples tyrosine kinase receptors to SOS (a guanine nucleotide exchange factor for Ras), via its C-terminal proline-rich sequences, and may thus modulate growth factor/Ras pathways by competing with SOS for binding to GRB2. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 1601 ENSG00000153071 DAB2 NA
lysyl oxidase like 2 This gene encodes a member of the lysyl oxidase gene family. The prototypic member of the family is essential to the biogenesis of connective tissue, encoding an extracellular copper-dependent amine oxidase that catalyses the first step in the formation of crosslinks in collagens and elastin. A highly conserved amino acid sequence at the C-terminus end appears to be sufficient for amine oxidase activity, suggesting that each family member may retain this function. The N-terminus is poorly conserved and may impart additional roles in developmental regulation, senescence, tumor suppression, cell growth control, and chemotaxis to each member of the family. 4017 ENSG00000134013 LOXL2 NA
carboxypeptidase B1 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. 1360 ENSG00000153002 CPB1 NA
EH domain containing 2 This gene encodes a member of the EH domain-containing protein family. These proteins are characterized by a C-terminal EF-hand domain, a nucleotide-binding consensus site at the N terminus and a bipartite nuclear localization signal. The encoded protein interacts with the actin cytoskeleton through an N-terminal domain and also binds to an EH domain-binding protein through the C-terminal EH domain. This interaction appears to connect clathrin-dependent endocytosis to actin, suggesting that this gene product participates in the endocytic pathway. 30846 ENSG00000024422 EHD2 NA
desmin This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. 1674 ENSG00000175084 DES NA
nicotinamide N-methyltransferase N-methylation is one method by which drug and other xenobiotic compounds are metabolized by the liver. This gene encodes the protein responsible for this enzymatic activity which uses S-adenosyl methionine as the methyl donor. 4837 ENSG00000166741 NNMT NA
fibrillin 1 This gene encodes a member of the fibrillin family of proteins. The encoded preproprotein is proteolytically processed to generate two proteins including the extracellular matrix component fibrillin-1 and the protein hormone asprosin. Fibrillin-1 is an extracellular matrix glycoprotein that serves as a structural component of calcium-binding microfibrils. These microfibrils provide force-bearing structural support in elastic and nonelastic connective tissue throughout the body. Asprosin, secreted by white adipose tissue, has been shown to regulate glucose homeostasis. Mutations in this gene are associated with Marfan syndrome and the related MASS phenotype, as well as ectopia lentis syndrome, Weill-Marchesani syndrome, Shprintzen-Goldberg syndrome and neonatal progeroid syndrome. 2200 ENSG00000166147 FBN1 NA
glycoprotein 2 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. 2813 ENSG00000169347 GP2 NA
syndecan 1 The protein encoded by this gene is a transmembrane (type I) heparan sulfate proteoglycan and is a member of the syndecan proteoglycan family. The syndecans mediate cell binding, cell signaling, and cytoskeletal organization and syndecan receptors are required for internalization of the HIV-1 tat protein. The syndecan-1 protein functions as an integral membrane protein and participates in cell proliferation, cell migration and cell-matrix interactions via its receptor for extracellular matrix proteins. Altered syndecan-1 expression has been detected in several different tumor types. While several transcript variants may exist for this gene, the full-length natures of only two have been described to date. These two represent the major variants of this gene and encode the same protein. 6382 ENSG00000115884 SDC1 NA
complement C1r subcomponent NA 715 ENSG00000159403 C1R NA
aconitase 1 The protein encoded by this gene is a bifunctional, cytosolic protein that functions as an essential enzyme in the TCA cycle and interacts with mRNA to control the levels of iron inside cells. When cellular iron levels are high, this protein binds to a 4Fe-4S cluster and functions as an aconitase. Aconitases are iron-sulfur proteins that function to catalyze the conversion of citrate to isocitrate. When cellular iron levels are low, the protein binds to iron-responsive elements (IREs), which are stem-loop structures found in the 5’ UTR of ferritin mRNA, and in the 3’ UTR of transferrin receptor mRNA. When the protein binds to IRE, it results in repression of translation of ferritin mRNA, and inhibition of degradation of the otherwise rapidly degraded transferrin receptor mRNA. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alternative splicing results in multiple transcript variants 48 ENSG00000122729 ACO1 NA
ectonucleotide pyrophosphatase/phosphodiesterase 2 The protein encoded by this gene functions as both a phosphodiesterase, which cleaves phosphodiester bonds at the 5’ end of oligonucleotides, and a phospholipase, which catalyzes production of lysophosphatidic acid (LPA) in extracellular fluids. LPA evokes growth factor-like responses including stimulation of cell proliferation and chemotaxis. This gene product stimulates the motility of tumor cells and has angiogenic properties, and its expression is upregulated in several kinds of carcinomas. The gene product is secreted and further processed to make the biologically active form. Several alternatively spliced transcript variants encoding different isoforms have been identified. 5168 ENSG00000136960 ENPP2 NA
sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 1 This gene encodes the protein core of a seminal plasma proteoglycan containing chondroitin- and heparan-sulfate chains. The protein’s function is unknown, although similarity to thyropin-type cysteine protease-inhibitors suggests its function may be related to protease inhibition. 6695 ENSG00000152377 SPOCK1 NA
immunoglobulin heavy constant gamma 1 (G1m marker) NA ENSG00000211896 ENSG00000211896 IGHG1 NA
dermokine This gene is upregulated in inflammatory diseases, and it was first observed as expressed in the differentiated layers of skin. The most interesting aspect of this gene is the differential use of promoters and terminators to generate isoforms with unique cellular distributions and domain components. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. 93099 ENSG00000161249 DMKN NA
eukaryotic translation elongation factor 2 This gene encodes a member of the GTP-binding translation elongation factor family. This protein is an essential factor for protein synthesis. It promotes the GTP-dependent translocation of the nascent protein chain from the A-site to the P-site of the ribosome. This protein is completely inactivated by EF-2 kinase phosporylation. 1938 ENSG00000167658 EEF2 NA
TIMP metallopeptidase inhibitor 4 This gene belongs to the TIMP gene family. The proteins encoded by this gene family are inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. The secreted, netrin domain-containing protein encoded by this gene is involved in regulation of platelet aggregation and recruitment and may play role in hormonal regulation and endometrial tissue remodeling. 7079 ENSG00000157150 TIMP4 NA
cathepsin B This gene encodes a member of the C1 family of peptidases. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate multiple protein products. These products include the cathepsin B light and heavy chains, which can dimerize to form the double chain form of the enzyme. This enzyme is a lysosomal cysteine protease with both endopeptidase and exopeptidase activity that may play a role in protein turnover. It is also known as amyloid precursor protein secretase and is involved in the proteolytic processing of amyloid precursor protein (APP). Incomplete proteolytic processing of APP has been suggested to be a causative factor in Alzheimer’s disease, the most common cause of dementia. Overexpression of the encoded protein has been associated with esophageal adenocarcinoma and other tumors. Multiple pseudogenes of this gene have been identified. 1508 ENSG00000164733 CTSB NA
protocadherin 18 This gene belongs to the protocadherin gene family, a subfamily of the cadherin superfamily. This gene encodes a protein which contains 6 extracellular cadherin domains, a transmembrane domain and a cytoplasmic tail differing from those of the classical cadherins. Although its specific function is undetermined, the cadherin-related neuronal receptor is thought to play a role in the establishment and function of specific cell-cell connections in the brain. 54510 ENSG00000189184 PCDH18 NA
chymotrypsin like elastase family member 3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. 10136 ENSG00000142789 CELA3A NA
septin 11 SEPT11 belongs to the conserved septin family of filament-forming cytoskeletal GTPases that are involved in a variety of cellular functions including cytokinesis and vesicle trafficking (Hanai et al., 2004 [PubMed 15196925]; Nagata et al., 2004 [PubMed 15485874]). 55752 ENSG00000138758 SEPT11 NA
myosin light chain kinase This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. 4638 ENSG00000065534 MYLK NA
REV3 like, DNA directed polymerase zeta catalytic subunit NA 5980 ENSG00000009413 REV3L NA
keratin 4 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3851 ENSG00000170477 KRT4 NA
ERBB receptor feedback inhibitor 1 ERRFI1 is a cytoplasmic protein whose expression is upregulated with cell growth (Wick et al., 1995 [PubMed 7641805]). It shares significant homology with the protein product of rat gene-33, which is induced during cell stress and mediates cell signaling (Makkinje et al., 2000 [PubMed 10749885]; Fiorentino et al., 2000 [PubMed 11003669]). 54206 ENSG00000116285 ERRFI1 NA
protamine 2 Protamines substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis, and are the major DNA-binding proteins in the nucleus of sperm in many vertebrates. They package the sperm DNA into a highly condensed complex in a volume less than 5% of a somatic cell nucleus. Many mammalian species have only one protamine (protamine 1); however, a few species, including human and mouse, have two. This gene encodes protamine 2, which is cleaved to give rise to a family of protamine 2 peptides. Alternatively spliced transcript variants have also been found for this gene. 5620 ENSG00000122304 PRM2 NA
carboxyl ester lipase The protein encoded by this gene is a glycoprotein secreted from the pancreas into the digestive tract and from the lactating mammary gland into human milk. The physiological role of this protein is in cholesterol and lipid-soluble vitamin ester hydrolysis and absorption. This encoded protein promotes large chylomicron production in the intestine. Also its presence in plasma suggests its interactions with cholesterol and oxidized lipoproteins to modulate the progression of atherosclerosis. In pancreatic tumoral cells, this encoded protein is thought to be sequestrated within the Golgi compartment and is probably not secreted. This gene contains a variable number of tandem repeat (VNTR) polymorphism in the coding region that may influence the function of the encoded protein. 1056 ENSG00000170835 CEL NA
coiled-coil domain containing 80 NA 151887 ENSG00000091986 CCDC80 NA
cell death inducing DFFA like effector c This gene encodes a member of the cell death-inducing DNA fragmentation factor-like effector family. Members of this family play important roles in apoptosis. The encoded protein promotes lipid droplet formation in adipocytes and may mediate adipocyte apoptosis. This gene is regulated by insulin and its expression is positively correlated with insulin sensitivity. Mutations in this gene may contribute to insulin resistant diabetes. A pseudogene of this gene is located on the short arm of chromosome 3. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. 63924 ENSG00000187288 CIDEC NA
1-acylglycerol-3-phosphate O-acyltransferase 2 This gene encodes a member of the 1-acylglycerol-3-phosphate O-acyltransferase family. The protein is located within the endoplasmic reticulum membrane and converts lysophosphatidic acid to phosphatidic acid, the second step in de novo phospholipid biosynthesis. Mutations in this gene have been associated with congenital generalized lipodystrophy (CGL), or Berardinelli-Seip syndrome, a disease characterized by a near absence of adipose tissue and severe insulin resistance. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. 10555 ENSG00000169692 AGPAT2 NA
creatine kinase, M-type The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. 1158 ENSG00000104879 CKM NA
acyl-CoA synthetase long-chain family member 1 The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. Several transcript variants encoding different isoforms have been found for this gene. 2180 ENSG00000151726 ACSL1 NA
ribosomal protein S2 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S5P family of ribosomal proteins. It is located in the cytoplasm. This gene shares sequence similarity with mouse LLRep3. It is co-transcribed with the small nucleolar RNA gene U64, which is located in its third intron. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. 6187 ENSG00000140988 RPS2 NA
complement factor D This gene encodes a member of the S1, or chymotrypsin, family of serine peptidases. This protease catalyzes the cleavage of factor B, the rate-limiting step of the alternative pathway of complement activation. This protein also functions as an adipokine, a cell signaling protein secreted by adipocytes, which regulates insulin secretion in mice. Mutations in this gene underlie complement factor D deficiency, which is associated with recurrent bacterial meningitis infections in human patients. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate the mature protease. 1675 ENSG00000197766 CFD NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",4,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 5 Annotations

out <- mygene::queryMany(gene_list[5,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id symbol summary query name
6616 SNAP25 Synaptic vesicle membrane docking and fusion is mediated by SNAREs (soluble N-ethylmaleimide-sensitive factor attachment protein receptors) located on the vesicle membrane (v-SNAREs) and the target membrane (t-SNAREs). The assembled v-SNARE/t-SNARE complex consists of a bundle of four helices, one of which is supplied by v-SNARE and the other three by t-SNARE. For t-SNAREs on the plasma membrane, the protein syntaxin supplies one helix and the protein encoded by this gene contributes the other two. Therefore, this gene product is a presynaptic plasma membrane protein involved in the regulation of neurotransmitter release. Two alternative transcript variants encoding different protein isoforms have been described for this gene. ENSG00000132639 synaptosome associated protein 25
3043 HBB The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. ENSG00000244734 hemoglobin subunit beta
801 CALM1 This gene encodes a member of the EF-hand calcium-binding protein family. It is one of three genes which encode an identical calcium binding protein which is one of the four subunits of phosphorylase kinase. Two pseudogenes have been identified on chromosome 7 and X. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000198668 calmodulin 1 (phosphorylase kinase, delta)
805 CALM2 This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000198668 calmodulin 2 (phosphorylase kinase, delta)
1277 COL1A1 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. ENSG00000108821 collagen type I alpha 1
1917 EEF1A2 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 2) is expressed in brain, heart and skeletal muscle, and the other isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas. This gene may be critical in the development of ovarian cancer. ENSG00000101210 eukaryotic translation elongation factor 1 alpha 2
10382 TUBB4A This gene encodes a member of the beta tubulin family. Beta tubulins are one of two core protein families (alpha and beta tubulins) that heterodimerize and assemble to form microtubules. Mutations in this gene cause hypomyelinating leukodystrophy-6 and autosomal dominant torsion dystonia-4. Alternate splicing results in multiple transcript variants encoding different isoforms. A pseudogene of this gene is found on chromosome X. ENSG00000104833 tubulin beta 4A class IVa
3798 KIF5A This gene encodes a member of the kinesin family of proteins. Members of this family are part of a multisubunit complex that functions as a microtubule motor in intracellular organelle transport. Mutations in this gene cause autosomal dominant spastic paraplegia 10. ENSG00000155980 kinesin family member 5A
477 ATP1A2 The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The catalytic subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes an alpha 2 subunit. Mutations in this gene result in familial basilar or hemiplegic migraines, and in a rare syndrome known as alternating hemiplegia of childhood. ENSG00000018625 ATPase Na+/K+ transporting subunit alpha 2
65009 NDRG4 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that is required for cell cycle progression and survival in primary astrocytes and may be involved in the regulation of mitogenic signalling in vascular smooth muscles cells. Alternative splicing results in multiple transcripts encoding different isoforms. ENSG00000103034 NDRG family member 4
567 B2M This gene encodes a serum protein found in association with the major histocompatibility complex (MHC) class I heavy chain on the surface of nearly all nucleated cells. The protein has a predominantly beta-pleated sheet structure that can form amyloid fibrils in some pathological conditions. The encoded antimicrobial protein displays antibacterial activity in amniotic fluid. A mutation in this gene has been shown to result in hypercatabolic hypoproteinemia. ENSG00000166710 beta-2-microglobulin
816 CAMK2B The product of this gene belongs to the serine/threonine protein kinase family and to the Ca(2+)/calmodulin-dependent protein kinase subfamily. Calcium signaling is crucial for several aspects of plasticity at glutamatergic synapses. In mammalian cells, the enzyme is composed of four different chains: alpha, beta, gamma, and delta. The product of this gene is a beta chain. It is possible that distinct isoforms of this chain have different cellular localizations and interact differently with calmodulin. Alternative splicing results in multiple transcript variants. ENSG00000058404 calcium/calmodulin dependent protein kinase II beta
1152 CKB The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in brain as well as in other tissues, and as a heterodimer with a similar muscle isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. A pseudogene of this gene has been characterized. ENSG00000166165 creatine kinase B
57030 SLC17A7 The protein encoded by this gene is a vesicle-bound, sodium-dependent phosphate transporter that is specifically expressed in the neuron-rich regions of the brain. It is preferentially associated with the membranes of synaptic vesicles and functions in glutamate transport. The protein shares 82% identity with the differentiation-associated Na-dependent inorganic phosphate cotransporter and they appear to form a distinct class within the Na+/Pi cotransporter family. ENSG00000104888 solute carrier family 17 member 7
230 ALDOC This gene encodes a member of the class I fructose-biphosphate aldolase gene family. Expressed specifically in the hippocampus and Purkinje cells of the brain, the encoded protein is a glycolytic enzyme that catalyzes the reversible aldol cleavage of fructose-1,6-biphosphate and fructose 1-phosphate to dihydroxyacetone phosphate and either glyceraldehyde-3-phosphate or glyceraldehyde, respectively. ENSG00000109107 aldolase, fructose-bisphosphate C
1292 COL6A2 This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. ENSG00000142173 collagen type VI alpha 2
7447 VSNL1 This gene is a member of the visinin/recoverin subfamily of neuronal calcium sensor proteins. The encoded protein is strongly expressed in granule cells of the cerebellum where it associates with membranes in a calcium-dependent manner and modulates intracellular signaling pathways of the central nervous system by directly or indirectly regulating the activity of adenylyl cyclase. Alternatively spliced transcript variants have been observed, but their full-length nature has not been determined. ENSG00000163032 visinin like 1
1281 COL3A1 This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. ENSG00000168542 collagen type III alpha 1 chain
3040 HBA2 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. ENSG00000188536 hemoglobin subunit alpha 2
6252 RTN1 This gene belongs to the family of reticulon encoding genes. Reticulons are associated with the endoplasmic reticulum, and are involved in neuroendocrine secretion or in membrane trafficking in neuroendocrine cells. This gene is considered to be a specific marker for neurological diseases and cancer, and is a potential molecular target for therapy. Alternative splicing results in multiple transcript variants. ENSG00000139970 reticulon 1
9796 PHYHIP NA ENSG00000168490 phytanoyl-CoA 2-hydroxylase interacting protein
57447 NDRG2 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that may play a role in neurite outgrowth. This gene may be involved in glioblastoma carcinogenesis. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. ENSG00000165795 NDRG family member 2
3800 KIF5C The protein encoded by this gene is a kinesin heavy chain subunit involved in the transport of cargo within the central nervous system. The encoded protein, which acts as a tetramer by associating with another heavy chain and two light chains, interacts with protein kinase CK2. Mutations in this gene have been associated with complex cortical dysplasia with other brain malformations-2. Two transcript variants, one protein-coding and the other non-protein coding, have been found for this gene. ENSG00000168280 kinesin family member 5C
6507 SLC1A3 This gene encodes a member of a member of a high affinity glutamate transporter family. This gene functions in the termination of excitatory neurotransmission in central nervous system. Mutations are associated with episodic ataxia, Type 6. Alternative splicing results in multiple transcript variants. ENSG00000079215 solute carrier family 1 member 3
1114 CHGB This gene encodes a tyrosine-sulfated secretory protein abundant in peptidergic endocrine cells and neurons. This protein may serve as a precursor for regulatory peptides. ENSG00000089199 chromogranin B
4130 MAP1A This gene encodes a protein that belongs to the microtubule-associated protein family. The proteins of this family are thought to be involved in microtubule assembly, which is an essential step in neurogenesis. The product of this gene is a precursor polypeptide that presumably undergoes proteolytic processing to generate the final MAP1A heavy chain and LC2 light chain. Expression of this gene is almost exclusively in the brain. Studies of the rat microtubule-associated protein 1A gene suggested a role in early events of spinal cord development. ENSG00000166963 microtubule associated protein 1A
2670 GFAP This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. ENSG00000131095 glial fibrillary acidic protein
6812 STXBP1 This gene encodes a syntaxin-binding protein. The encoded protein appears to play a role in release of neurotransmitters via regulation of syntaxin, a transmembrane attachment protein receptor. Mutations in this gene have been associated with infantile epileptic encephalopathy-4. Alternatively spliced transcript variants have been described. ENSG00000136854 syntaxin binding protein 1
1759 DNM1 This gene encodes a member of the dynamin subfamily of GTP-binding proteins. The encoded protein possesses unique mechanochemical properties used to tubulate and sever membranes, and is involved in clathrin-mediated endocytosis and other vesicular trafficking processes. Actin and other cytoskeletal proteins act as binding partners for the encoded protein, which can also self-assemble leading to stimulation of GTPase activity. More than sixty highly conserved copies of the 3’ region of this gene are found elsewhere in the genome, particularly on chromosomes Y and 15. Alternatively spliced transcript variants encoding different isoforms have been described. ENSG00000106976 dynamin 1
146330 FBXL16 Members of the F-box protein family, such as FBXL16, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). ENSG00000127585 F-box and leucine rich repeat protein 16
287 ANK2 This gene encodes a member of the ankyrin family of proteins that link the integral membrane proteins to the underlying spectrin-actin cytoskeleton. Ankyrins play key roles in activities such as cell motility, activation, proliferation, contact and the maintenance of specialized membrane domains. Most ankyrins are typically composed of three structural domains: an amino-terminal domain containing multiple ankyrin repeats; a central region with a highly conserved spectrin binding domain; and a carboxy-terminal regulatory domain which is the least conserved and subject to variation. The protein encoded by this gene is required for targeting and stability of Na/Ca exchanger 1 in cardiomyocytes. Mutations in this gene cause long QT syndrome 4 and cardiac arrhythmia syndrome. Multiple transcript variants encoding different isoforms have been described. ENSG00000145362 ankyrin 2, neuronal
11075 STMN2 This gene encodes a member of the stathmin family of phosphoproteins. Stathmin proteins function in microtubule dynamics and signal transduction. The encoded protein plays a regulatory role in neuronal growth and is also thought to be involved in osteogenesis. Reductions in the expression of this gene have been associated with Down’s syndrome and Alzheimer’s disease. Alternatively spliced transcript variants have been observed for this gene. A pseudogene of this gene is located on the long arm of chromosome 6. ENSG00000104435 stathmin 2
11170 FAM107A NA ENSG00000168309 family with sequence similarity 107 member A
9145 SYNGR1 This gene encodes an integral membrane protein associated with presynaptic vesicles in neuronal cells. The exact function of this protein is unclear, but studies of a similar murine protein suggest that it functions in synaptic plasticity without being required for synaptic transmission. The gene product belongs to the synaptogyrin gene family. Three alternatively spliced variants encoding three different isoforms have been identified. ENSG00000100321 synaptogyrin 1
8497 PPFIA4 PPFIA4, or liprin-alpha-4, belongs to the liprin-alpha gene family. See liprin-alpha-1 (LIP1, or PPFIA1; MIM 611054) for background on liprins. ENSG00000143847 PTPRF interacting protein alpha 4
4627 MYH9 This gene encodes a conventional non-muscle myosin; this protein should not be confused with the unconventional myosin-9a or 9b (MYO9A or MYO9B). The encoded protein is a myosin IIA heavy chain that contains an IQ domain and a myosin head-like domain which is involved in several important functions, including cytokinesis, cell motility and maintenance of cell shape. Defects in this gene have been associated with non-syndromic sensorineural deafness autosomal dominant type 17, Epstein syndrome, Alport syndrome with macrothrombocytopenia, Sebastian syndrome, Fechtner syndrome and macrothrombocytopenia with progressive sensorineural deafness. ENSG00000100345 myosin, heavy chain 9, non-muscle
6620 SNCB This gene encodes a member of a small family of proteins that inhibit phospholipase D2 and may function in neuronal plasticity. The encoded protein is abundant in lesions of patients with Alzheimer disease. A mutation in this gene was found in individuals with dementia with Lewy bodies. Alternative splicing results in multiple transcript variants. ENSG00000074317 synuclein beta
808 CALM3 NA ENSG00000160014 calmodulin 3 (phosphorylase kinase, delta)
805 CALM2 This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000160014 calmodulin 2 (phosphorylase kinase, delta)
25789 TMEM59L This gene encodes a predicted type-I membrane glycoprotein. The encoded protein may play a role in functioning of the central nervous system. ENSG00000105696 transmembrane protein 59 like
9764 KIAA0513 NA ENSG00000135709 KIAA0513
112755 STX1B The protein encoded by this gene belongs to a family of proteins thought to play a role in the exocytosis of synaptic vesicles. Vesicle exocytosis releases vesicular contents and is important to various cellular functions. For instance, the secretion of transmitters from neurons plays an important role in synaptic transmission. After exocytosis, the membrane and proteins from the vesicle are retrieved from the plasma membrane through the process of endocytosis. Mutations in this gene have been identified as one cause of fever-associated epilepsy syndromes. A possible link between this gene and Parkinson’s disease has also been suggested. ENSG00000099365 syntaxin 1B
770 CA11 Carbonic anhydrases (CAs) are a large family of zinc metalloenzymes that catalyze the reversible hydration of carbon dioxide. They participate in a variety of biological processes, including respiration, calcification, acid-base balance, bone resorption, and the formation of aqueous humor, cerebrospinal fluid, saliva, and gastric acid. They show extensive diversity in tissue distribution and in their subcellular localization. CA XI is likely a secreted protein, however, radical changes at active site residues completely conserved in CA isozymes with catalytic activity, make it unlikely that it has carbonic anhydrase activity. It shares properties in common with two other acatalytic CA isoforms, CA VIII and CA X. CA XI is most abundantly expressed in brain, and may play a general role in the central nervous system. ENSG00000063180 carbonic anhydrase 11
59 ACTA2 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. ENSG00000107796 actin, alpha 2, smooth muscle, aorta
7057 THBS1 The protein encoded by this gene is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein can bind to fibrinogen, fibronectin, laminin, type V collagen and integrins alpha-V/beta-1. This protein has been shown to play roles in platelet aggregation, angiogenesis, and tumorigenesis. ENSG00000137801 thrombospondin 1
1363 CPE This gene encodes a member of the M14 family of metallocarboxypeptidases. The encoded preproprotein is proteolytically processed to generate the mature peptidase. This peripheral membrane protein cleaves C-terminal amino acid residues and is involved in the biosynthesis of peptide hormones and neurotransmitters, including insulin. This protein may also function independently of its peptidase activity, as a neurotrophic factor that promotes neuronal survival, and as a sorting receptor that binds to regulated secretory pathway proteins, including prohormones. Mutations in this gene are implicated in type 2 diabetes. ENSG00000109472 carboxypeptidase E
9762 LZTS3 NA ENSG00000088899 leucine zipper, putative tumor suppressor family member 3
192683 SCAMP5 NA ENSG00000198794 secretory carrier membrane protein 5
482 ATP1B2 The protein encoded by this gene belongs to the family of Na+/K+ and H+/K+ ATPases beta chain proteins, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The beta subunit regulates, through assembly of alpha/beta heterodimers, the number of sodium pumps transported to the plasma membrane. The glycoprotein subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes a beta 2 subunit. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000129244 ATPase Na+/K+ transporting subunit beta 2
3133 HLA-E HLA-E belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. HLA-E binds a restricted subset of peptides derived from the leader peptides of other class I molecules. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. ENSG00000204592 major histocompatibility complex, class I, E
2026 ENO2 This gene encodes one of the three enolase isoenzymes found in mammals. This isoenzyme, a homodimer, is found in mature neurons and cells of neuronal origin. A switch from alpha enolase to gamma enolase occurs in neural tissue during development in rats and primates. ENSG00000111674 enolase 2
23542 MAPK8IP2 The protein encoded by this gene is closely related to MAPK8IP1/IB1/JIP-1, a scaffold protein that is involved in the c-Jun amino-terminal kinase signaling pathway. This protein is expressed in brain and pancreatic cells. It has been shown to interact with, and regulate the activity of MAPK8/JNK1, and MAP2K7/MKK7 kinases. This protein thus is thought to function as a regulator of signal transduction by protein kinase cascade in brain and pancreatic beta-cells. ENSG00000008735 mitogen-activated protein kinase 8 interacting protein 2
1634 DCN This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. ENSG00000011465 decorin
4313 MMP2 This gene is a member of the matrix metalloproteinase (MMP) gene family, that are zinc-dependent enzymes capable of cleaving components of the extracellular matrix and molecules involved in signal transduction. The protein encoded by this gene is a gelatinase A, type IV collagenase, that contains three fibronectin type II repeats in its catalytic site that allow binding of denatured type IV and V collagen and elastin. Unlike most MMP family members, activation of this protein can occur on the cell membrane. This enzyme can be activated extracellularly by proteases, or, intracellulary by its S-glutathiolation with no requirement for proteolytical removal of the pro-domain. This protein is thought to be involved in multiple pathways including roles in the nervous system, endometrial menstrual breakdown, regulation of vascularization, and metastasis. Mutations in this gene have been associated with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000087245 matrix metallopeptidase 2
302 ANXA2 This gene encodes a member of the annexin family. Members of this calcium-dependent phospholipid-binding protein family play a role in the regulation of cellular growth and in signal transduction pathways. This protein functions as an autocrine factor which heightens osteoclast formation and bone resorption. This gene has three pseudogenes located on chromosomes 4, 9 and 10, respectively. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000182718 annexin A2
2192 FBLN1 Fibulin 1 is a secreted glycoprotein that becomes incorporated into a fibrillar extracellular matrix. Calcium-binding is apparently required to mediate its binding to laminin and nidogen. It mediates platelet adhesion via binding fibrinogen. Four splice variants which differ in the 3’ end have been identified. Each variant encodes a different isoform, but no functional distinctions have been identified among the four variants. ENSG00000077942 fibulin 1
972 CD74 The protein encoded by this gene associates with class II major histocompatibility complex (MHC) and is an important chaperone that regulates antigen presentation for immune response. It also serves as cell surface receptor for the cytokine macrophage migration inhibitory factor (MIF) which, when bound to the encoded protein, initiates survival pathways and cell proliferation. This protein also interacts with amyloid precursor protein (APP) and suppresses the production of amyloid beta (Abeta). Multiple alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000019582 CD74 molecule
23095 KIF1B This gene encodes a motor protein that transports mitochondria and synaptic vesicle precursors. Mutations in this gene cause Charcot-Marie-Tooth disease, type 2A1. ENSG00000054523 kinesin family member 1B
9379 NRXN2 This gene encodes a member of the neurexin gene family. The products of these genes function as cell adhesion molecules and receptors in the vertebrate nervous system. These genes utilize two promoters. The majority of transcripts are produced from the upstream promoter and encode alpha-neurexin isoforms while a smaller number of transcripts are produced from the downstream promoter and encode beta-neuresin isoforms. The alpha-neurexins contain epidermal growth factor-like (EGF-like) sequences and laminin G domains, and have been shown to interact with neurexophilins. The beta-neurexins lack EGF-like sequences and contain fewer laminin G domains than alpha-neurexins. Alternative splicing and the use of alternative promoters may generate thousands of transcript variants (PMID: 12036300, PMID: 11944992). ENSG00000110076 neurexin 2
23413 NCS1 This gene is a member of the neuronal calcium sensor gene family, which encode calcium-binding proteins expressed predominantly in neurons. The protein encoded by this gene regulates G protein-coupled receptor phosphorylation in a calcium-dependent manner and can substitute for calmodulin. The protein is associated with secretory granules and modulates synaptic transmission and synaptic plasticity. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000107130 neuronal calcium sensor 1
10472 ZBTB18 This gene encodes a C2H2-type zinc finger protein which acts a transcriptional repressor of genes involved in neuronal development. The encoded protein recognizes a specific sequence motif and recruits components of chromatin to target genes. Alternative splicing results in multiple transcript variants. ENSG00000179456 zinc finger and BTB domain containing 18
79026 AHNAK NA ENSG00000124942 AHNAK nucleoprotein
23362 PSD3 NA ENSG00000156011 pleckstrin and Sec7 domain containing 3
5310 PKD1 This gene encodes a member of the polycystin protein family. The encoded glycoprotein contains a large N-terminal extracellular region, multiple transmembrane domains and a cytoplasmic C-tail. It is an integral membrane protein that functions as a regulator of calcium permeable cation channels and intracellular calcium homoeostasis. It is also involved in cell-cell/matrix interactions and may modulate G-protein-coupled signal-transduction pathways. It plays a role in renal tubular development, and mutations in this gene cause autosomal dominant polycystic kidney disease type 1 (ADPKD1). ADPKD1 is characterized by the growth of fluid-filled cysts that replace normal renal tissue and result in end-stage renal failure. Splice variants encoding different isoforms have been noted for this gene. Also, six pseudogenes, closely linked in a known duplicated region on chromosome 16p, have been described. ENSG00000008710 polycystin 1, transient receptor potential channel interacting
7431 VIM This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. ENSG00000026025 vimentin
165 AEBP1 This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. ENSG00000106624 AE binding protein 1
155066 ATP6V0E2 Multisubunit vacuolar-type proton pumps, or H(+)-ATPases, acidify various intracellular compartments, such as vacuoles, clathrin-coated and synaptic vesicles, endosomes, lysosomes, and chromaffin granules. H(+)-ATPases are also found in plasma membranes of specialized cells, where they play roles in urinary acidification, bone resorption, and sperm maturation. Multiple subunits form H(+)-ATPases, with proteins of the V1 class hydrolyzing ATP for energy to transport H+, and proteins of the V0 class forming an integral membrane domain through which H+ is transported. ATP6V0E2 encodes an isoform of the H(+)-ATPase V0 e subunit, an essential proton pump component (Blake-Palmer et al., 2007 [PubMed 17350184]). ENSG00000171130 ATPase H+ transporting V0 subunit e2
51286 CEND1 The protein encoded by this gene is a neuron-specific protein. The similar protein in pig enhances neuroblastoma cell differentiation in vitro and may be involved in neuronal differentiation in vivo. Multiple pseudogenes have been reported for this gene. ENSG00000184524 cell cycle exit and neuronal differentiation 1
7532 YWHAG This gene product belongs to the 14-3-3 family of proteins which mediate signal transduction by binding to phosphoserine-containing proteins. This highly conserved protein family is found in both plants and mammals, and this protein is 100% identical to the rat ortholog. It is induced by growth factors in human vascular smooth muscle cells, and is also highly expressed in skeletal and heart muscles, suggesting an important role for this protein in muscle tissue. It has been shown to interact with RAF1 and protein kinase C, proteins involved in various signal transduction pathways. ENSG00000170027 tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein gamma
10313 RTN3 This gene belongs to the reticulon family of highly conserved genes that are preferentially expressed in neuroendocrine tissues. This family of proteins interact with, and modulate the activity of beta-amyloid converting enzyme 1 (BACE1), and the production of amyloid-beta. An increase in the expression of any reticulon protein substantially reduces the production of amyloid-beta, suggesting that reticulon proteins are negative modulators of BACE1 in cells. Alternatively spliced transcript variants encoding different isoforms have been found for this gene, and pseudogenes of this gene are located on chromosomes 4 and 12. ENSG00000133318 reticulon 3
63908 NAPB NA ENSG00000125814 NSF attachment protein beta
4131 MAP1B This gene encodes a protein that belongs to the microtubule-associated protein family. The proteins of this family are thought to be involved in microtubule assembly, which is an essential step in neurogenesis. The product of this gene is a precursor polypeptide that presumably undergoes proteolytic processing to generate the final MAP1B heavy chain and LC1 light chain. Gene knockout studies of the mouse microtubule-associated protein 1B gene suggested an important role in development and function of the nervous system. ENSG00000131711 microtubule associated protein 1B
114088 TRIM9 The protein encoded by this gene is a member of the tripartite motif (TRIM) family. The TRIM motif includes three zinc-binding domains, a RING, a B-box type 1 and a B-box type 2, and a coiled-coil region. The protein localizes to cytoplasmic bodies. Its function has not been identified. Alternate splicing of this gene generates two transcript variants encoding different isoforms. ENSG00000100505 tripartite motif containing 9
1742 DLG4 This gene encodes a member of the membrane-associated guanylate kinase (MAGUK) family. It heteromultimerizes with another MAGUK protein, DLG2, and is recruited into NMDA receptor and potassium channel clusters. These two MAGUK proteins may interact at postsynaptic sites to form a multimeric scaffold for the clustering of receptors, ion channels, and associated signaling proteins. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000132535 discs large MAGUK scaffold protein 4
8522 GAS7 Growth arrest-specific 7 is expressed primarily in terminally differentiated brain cells and predominantly in mature cerebellar Purkinje neurons. GAS7 plays a putative role in neuronal development. Several transcript variants encoding proteins which vary in the N-terminus have been described. ENSG00000007237 growth arrest specific 7
ENSG00000247556 OIP5-AS1 NA ENSG00000247556 OIP5 antisense RNA 1
7345 UCHL1 The protein encoded by this gene belongs to the peptidase C12 family. This enzyme is a thiol protease that hydrolyzes a peptide bond at the C-terminal glycine of ubiquitin. This gene is specifically expressed in the neurons and in cells of the diffuse neuroendocrine system. Mutations in this gene may be associated with Parkinson disease. ENSG00000154277 ubiquitin C-terminal hydrolase L1
9806 SPOCK2 This gene encodes a protein which binds with glycosaminoglycans to form part of the extracellular matrix. The protein contains thyroglobulin type-1, follistatin-like, and calcium-binding domains, and has glycosaminoglycan attachment sites in the acidic C-terminal region. Three alternatively spliced transcript variants that encode different protein isoforms have been described for this gene. ENSG00000107742 sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 2
1293 COL6A3 This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. ENSG00000163359 collagen type VI alpha 3 chain
3039 HBA1 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. ENSG00000206172 hemoglobin subunit alpha 1
166 AES The protein encoded by this gene is similar in sequence to the amino terminus of Drosophila enhancer of split groucho, a protein involved in neurogenesis during embryonic development. The encoded protein, which belongs to the groucho/TLE family of proteins, can function as a homooligomer or as a heteroologimer with other family members to dominantly repress the expression of other family member genes. Three transcript variants encoding different isoforms have been found for this gene. ENSG00000104964 amino-terminal enhancer of split
1915 EEF1A1 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas, and the other isoform (alpha 2) is expressed in brain, heart and skeletal muscle. This isoform is identified as an autoantigen in 66% of patients with Felty syndrome. This gene has been found to have multiple copies on many chromosomes, some of which, if not all, represent different pseudogenes. ENSG00000156508 eukaryotic translation elongation factor 1 alpha 1
57476 GRAMD1B NA ENSG00000023171 GRAM domain containing 1B
4185 ADAM11 This gene encodes a member of the ADAM (a disintegrin and metalloprotease) protein family. Members of this family are membrane-anchored proteins structurally related to snake venom disintegrins, and have been implicated in a variety of biological processes involving cell-cell and cell-matrix interactions, including fertilization, muscle development, and neurogenesis. The encoded preproprotein is proteolytically processed to generate the mature protease. This gene represents a candidate tumor suppressor gene for human breast cancer based on its location within a minimal region of chromosome 17q21 previously defined by tumor deletion mapping. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. ENSG00000073670 ADAM metallopeptidase domain 11
4504 MT3 NA ENSG00000087250 metallothionein 3
23467 NPTXR This gene encodes a protein similar to the rat neuronal pentraxin receptor. The rat pentraxin receptor is an integral membrane protein that is thought to mediate neuronal uptake of the snake venom toxin, taipoxin, and its transport into the synapses. Studies in rat indicate that translation of this mRNA initiates at a non-AUG (CUG) codon. This may also be true for mouse and human, based on strong sequence conservation amongst these species. ENSG00000221890 neuronal pentraxin receptor
51310 SLC22A17 NA ENSG00000092096 solute carrier family 22 member 17
4256 MGP The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. ENSG00000111341 matrix Gla protein
25999 CLIP3 This gene encodes a member of the cytoplasmic linker protein 170 family. Members of this protein family contain a cytoskeleton-associated protein glycine-rich domain and mediate the interaction of microtubules with cellular organelles. The encoded protein plays a role in T cell apoptosis by facilitating the association of tubulin and the lipid raft ganglioside GD3. The encoded protein also functions as a scaffold protein mediating membrane localization of phosphorylated protein kinase B. Alternatively spliced transcript variants have been observed for this gene. ENSG00000105270 CAP-Gly domain containing linker protein 3
9900 SV2A NA ENSG00000159164 synaptic vesicle glycoprotein 2A
ENSG00000225630 MTND2P28 NA ENSG00000225630 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28
6324 SCN1B Voltage-gated sodium channels are heteromeric proteins that function in the generation and propagation of action potentials in muscle and neuronal cells. They are composed of one alpha and two beta subunits, where the alpha subunit provides channel activity and the beta-1 subunit modulates the kinetics of channel inactivation. This gene encodes a sodium channel beta-1 subunit. Mutations in this gene result in generalized epilepsy with febrile seizures plus, Brugada syndrome 5, and defects in cardiac conduction. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000105711 sodium voltage-gated channel beta subunit 1
7045 TGFBI This gene encodes an RGD-containing protein that binds to type I, II and IV collagens. The RGD motif is found in many extracellular matrix proteins modulating cell adhesion and serves as a ligand recognition sequence for several integrins. This protein plays a role in cell-collagen interactions and may be involved in endochondrial bone formation in cartilage. The protein is induced by transforming growth factor-beta and acts to inhibit cell adhesion. Mutations in this gene are associated with multiple types of corneal dystrophy. ENSG00000120708 transforming growth factor beta induced
5037 PEBP1 This gene encodes a member of the phosphatidylethanolamine-binding family of proteins and has been shown to modulate multiple signaling pathways, including the MAP kinase (MAPK), NF-kappa B, and glycogen synthase kinase-3 (GSK-3) signaling pathways. The encoded protein can be further processed to form a smaller cleavage product, hippocampal cholinergic neurostimulating peptide (HCNP), which may be involved in neural development. This gene has been implicated in numerous human cancers and may act as a metastasis suppressor gene. Multiple pseudogenes of this gene have been identified in the genome. ENSG00000089220 phosphatidylethanolamine binding protein 1
9783 RIMS3 NA ENSG00000117016 regulating synaptic membrane exocytosis 3
50861 STMN3 This gene encodes a protein which is a member of the stathmin protein family. Members of this protein family form a complex with tubulins at a ratio of 2 tubulins for each stathmin protein. Microtubules require the ordered assembly of alpha- and beta-tubulins, and formation of a complex with stathmin disrupts microtubule formation and function. A pseudogene of this gene is located on chromosome 22. Alternative splicing results in multiple transcript variants. ENSG00000197457 stathmin 3
57731 SPTBN4 Spectrin is an actin crosslinking and molecular scaffold protein that links the plasma membrane to the actin cytoskeleton, and functions in the determination of cell shape, arrangement of transmembrane proteins, and organization of organelles. It is composed of two antiparallel dimers of alpha- and beta- subunits. This gene is one member of a family of beta-spectrin genes. The encoded protein localizes to the nuclear matrix, PML nuclear bodies, and cytoplasmic vesicles. A highly similar gene in the mouse is required for localization of specific membrane proteins in polarized regions of neurons. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000160460 spectrin beta, non-erythrocytic 4
ENSG00000237973 MTCO1P12 NA ENSG00000237973 MT-CO1 pseudogene 12
60 ACTB This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. ENSG00000075624 actin, beta
3487 IGFBP4 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein binds both insulin-like growth factors (IGFs) I and II and circulates in the plasma in both glycosylated and non-glycosylated forms. Binding of this protein prolongs the half-life of the IGFs and alters their interaction with cell surface receptors. ENSG00000141753 insulin like growth factor binding protein 4
116986 AGAP2 The protein encoded by this gene belongs to the centaurin gamma-like family. It mediates anti-apoptotic effects of nerve growth factor by activating nuclear phosphoinositide 3-kinase. It is overexpressed in cancer cells, and promotes cancer cell invasion. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. ENSG00000135439 ArfGAP with GTPase domain, ankyrin repeat and PH domain 2
8425 LTBP4 The protein encoded by this gene binds transforming growth factor beta (TGFB) as it is secreted and targeted to the extracellular matrix. TGFB is biologically latent after secretion and insertion into the extracellular matrix, and sheds TGFB and other proteins upon activation. Defects in this gene may be a cause of cutis laxa and severe pulmonary, gastrointestinal, and urinary abnormalities. Three transcript variants encoding different isoforms have been found for this gene. ENSG00000090006 latent transforming growth factor beta binding protein 4
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",5,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 6 Annotations

out <- mygene::queryMany(gene_list[6,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id name summary symbol query
3043 hemoglobin subunit beta The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. HBB ENSG00000244734
3039 hemoglobin subunit alpha 1 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. HBA1 ENSG00000206172
1277 collagen type I alpha 1 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. COL1A1 ENSG00000108821
6711 spectrin beta, non-erythrocytic 1 Spectrin is an actin crosslinking and molecular scaffold protein that links the plasma membrane to the actin cytoskeleton, and functions in the determination of cell shape, arrangement of transmembrane proteins, and organization of organelles. It is composed of two antiparallel dimers of alpha- and beta- subunits. This gene is one member of a family of beta-spectrin genes. The encoded protein contains an N-terminal actin-binding domain, and 17 spectrin repeats which are involved in dimer formation. Multiple transcript variants encoding different isoforms have been found for this gene. SPTBN1 ENSG00000115306
6280 S100 calcium binding protein A9 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. S100A9 ENSG00000163220
3848 keratin 1 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. KRT1 ENSG00000167768
8404 SPARC like 1 NA SPARCL1 ENSG00000152583
2335 fibronectin 1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. FN1 ENSG00000115414
3040 hemoglobin subunit alpha 2 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. HBA2 ENSG00000188536
6678 secreted protein acidic and cysteine rich This gene encodes a cysteine-rich acidic matrix-associated protein. The encoded protein is required for the collagen in bone to become calcified but is also involved in extracellular matrix synthesis and promotion of changes to cell shape. The gene product has been associated with tumor suppression but has also been correlated with metastasis based on changes to cell shape which can promote tumor cell invasion. Three transcript variants encoding different isoforms have been found for this gene. SPARC ENSG00000113140
975 CD81 molecule The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. This encoded protein is a cell surface glycoprotein that is known to complex with integrins. This protein appears to promote muscle cell fusion and support myotube maintenance. Also it may be involved in signal transduction. This gene is localized in the tumor-suppressor gene region and thus it is a candidate gene for malignancies. Two transcript variants encoding different isoforms have been found for this gene. CD81 ENSG00000110651
4155 myelin basic protein The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. MBP ENSG00000197971
11167 follistatin like 1 This gene encodes a protein with similarity to follistatin, an activin-binding protein. It contains an FS module, a follistatin-like sequence containing 10 conserved cysteine residues. This gene product is thought to be an autoantigen associated with rheumatoid arthritis. FSTL1 ENSG00000163430
2034 endothelial PAS domain protein 1 This gene encodes a transcription factor involved in the induction of genes regulated by oxygen, which is induced as oxygen levels fall. The encoded protein contains a basic-helix-loop-helix domain protein dimerization domain as well as a domain found in proteins in signal transduction pathways which respond to oxygen levels. Mutations in this gene are associated with erythrocytosis familial type 4. EPAS1 ENSG00000116016
1278 collagen type I alpha 2 chain This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. COL1A2 ENSG00000164692
4035 LDL receptor related protein 1 This gene encodes a member of the low-density lipoprotein receptor family of proteins. The encoded preproprotein is proteolytically processed by furin to generate 515 kDa and 85 kDa subunits that form the mature receptor (PMID: 8546712). This receptor is involved in several cellular processes, including intracellular signaling, lipid homeostasis, and clearance of apoptotic cells. In addition, the encoded protein is necessary for the alpha 2-macroglobulin-mediated clearance of secreted amyloid precursor protein and beta-amyloid, the main component of amyloid plaques found in Alzheimer patients. Expression of this gene decreases with age and has been found to be lower than controls in brain tissue from Alzheimer’s disease patients. LRP1 ENSG00000123384
2 alpha-2-macroglobulin Alpha-2-macroglobulin is a protease inhibitor and cytokine transporter. It inhibits many proteases, including trypsin, thrombin and collagenase. A2M is implicated in Alzheimer disease (AD) due to its ability to mediate the clearance and degradation of A-beta, the major component of beta-amyloid deposits. A2M ENSG00000175899
1490 connective tissue growth factor The protein encoded by this gene is a mitogen that is secreted by vascular endothelial cells. The encoded protein plays a role in chondrocyte proliferation and differentiation, cell adhesion in many cell types, and is related to platelet-derived growth factor. Certain polymorphisms in this gene have been linked with a higher incidence of systemic sclerosis. CTGF ENSG00000118523
3858 keratin 10 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. KRT10 ENSG00000186395
51629 solute carrier family 25 member 39 This gene encodes a member of the SLC25 transporter or mitochondrial carrier family of proteins. Members of this family are encoded by the nuclear genome while their protein products are usually embedded in the inner mitochondrial membrane and exhibit wide-ranging substrate specificity. Although the encoded protein is currently considered an orphan transporter, this protein is related to other carriers known to transport amino acids. This protein may play a role in iron homeostasis. SLC25A39 ENSG00000013306
1465 cysteine and glycine rich protein 1 This gene encodes a member of the cysteine-rich protein (CSRP) family. This gene family includes a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this gene product occurs in proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Alternatively spliced transcript variants have been described. CSRP1 ENSG00000159176
351 amyloid beta precursor protein This gene encodes a cell surface receptor and transmembrane precursor protein that is cleaved by secretases to form a number of peptides. Some of these peptides are secreted and can bind to the acetyltransferase complex APBB1/TIP60 to promote transcriptional activation, while others form the protein basis of the amyloid plaques found in the brains of patients with Alzheimer disease. In addition, two of the peptides are antimicrobial peptides, having been shown to have bacteriocidal and antifungal activities. Mutations in this gene have been implicated in autosomal dominant Alzheimer disease and cerebroarterial amyloidosis (cerebral amyloid angiopathy). Multiple transcript variants encoding several different isoforms have been found for this gene. APP ENSG00000142192
ENSG00000225630 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28 NA MTND2P28 ENSG00000225630
2037 erythrocyte membrane protein band 4.1 like 2 NA EPB41L2 ENSG00000079819
8531 Y-box binding protein 3 NA YBX3 ENSG00000060138
1266 calponin 3 This gene encodes a protein with a markedly acidic C terminus; the basic N-terminus is highly homologous to the N-terminus of a related gene, CNN1. Members of the CNN gene family all contain similar tandemly repeated motifs. This encoded protein is associated with the cytoskeleton but is not involved in contraction. CNN3 ENSG00000117519
2512 ferritin, light polypeptide This gene encodes the light subunit of the ferritin protein. Ferritin is the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in this light chain ferritin gene are associated with several neurodegenerative diseases and hyperferritinemia-cataract syndrome. This gene has multiple pseudogenes. FTL ENSG00000087086
5159 platelet derived growth factor receptor beta This gene encodes a cell surface tyrosine kinase receptor for members of the platelet-derived growth factor family. These growth factors are mitogens for cells of mesenchymal origin. The identity of the growth factor bound to a receptor monomer determines whether the functional receptor is a homodimer or a heterodimer, composed of both platelet-derived growth factor receptor alpha and beta polypeptides. This gene is flanked on chromosome 5 by the genes for granulocyte-macrophage colony-stimulating factor and macrophage-colony stimulating factor receptor; all three genes may be implicated in the 5-q syndrome. A translocation between chromosomes 5 and 12, that fuses this gene to that of the translocation, ETV6, leukemia gene, results in chronic myeloproliferative disorder with eosinophilia. PDGFRB ENSG00000113721
3959 galectin 3 binding protein The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. LGALS3BP has been found elevated in the serum of patients with cancer and in those infected by the human immunodeficiency virus (HIV). It appears to be implicated in immune response associated with natural killer (NK) and lymphokine-activated killer (LAK) cell cytotoxicity. Using fluorescence in situ hybridization the full length 90K cDNA has been localized to chromosome 17q25. The native protein binds specifically to a human macrophage-associated lectin known as Mac-2 and also binds galectin 1. LGALS3BP ENSG00000108679
805 calmodulin 2 (phosphorylase kinase, delta) This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. CALM2 ENSG00000143933
23313 KIAA0930 NA KIAA0930 ENSG00000100364
1152 creatine kinase B The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in brain as well as in other tissues, and as a heterodimer with a similar muscle isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. A pseudogene of this gene has been characterized. CKB ENSG00000166165
1293 collagen type VI alpha 3 chain This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. COL6A3 ENSG00000163359
7038 thyroglobulin Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. TG ENSG00000042832
1471 cystatin C The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and the kininogens. The type 2 cystatin proteins are a class of cysteine proteinase inhibitors found in a variety of human fluids and secretions, where they appear to provide protective functions. The cystatin locus on chromosome 20 contains the majority of the type 2 cystatin genes and pseudogenes. This gene is located in the cystatin locus and encodes the most abundant extracellular inhibitor of cysteine proteases, which is found in high concentrations in biological fluids and is expressed in virtually all organs of the body. A mutation in this gene has been associated with amyloid angiopathy. Expression of this protein in vascular wall smooth muscle cells is severely reduced in both atherosclerotic and aneurysmal aortic lesions, establishing its role in vascular disease. In addition, this protein has been shown to have an antimicrobial function, inhibiting the replication of herpes simplex virus. Alternative splicing results in multiple transcript variants encoding a single protein. CST3 ENSG00000101439
5376 peripheral myelin protein 22 This gene encodes an integral membrane protein that is a major component of myelin in the peripheral nervous system. Studies suggest two alternately used promoters drive tissue-specific expression. Various mutations of this gene are causes of Charcot-Marie-Tooth disease Type IA, Dejerine-Sottas syndrome, and hereditary neuropathy with liability to pressure palsies. Alternative splicing results in multiple transcript variants. PMP22 ENSG00000109099
3320 heat shock protein 90kDa alpha family class A member 1 The protein encoded by this gene is an inducible molecular chaperone that functions as a homodimer. The encoded protein aids in the proper folding of specific target proteins by use of an ATPase activity that is modulated by co-chaperones. Two transcript variants encoding different isoforms have been found for this gene. HSP90AA1 ENSG00000080824
3106 major histocompatibility complex, class I, B HLA-B belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon 1 encodes the leader peptide, exon 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Hundreds of HLA-B alleles have been described. HLA-B ENSG00000234745
219654 zinc finger CCHC-type containing 24 NA ZCCHC24 ENSG00000165424
3912 laminin subunit beta 1 Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. Several isoforms of each chain have been described. Different alpha, beta and gamma chain isomers combine to give rise to different heterotrimeric laminin isoforms which are designated by Arabic numerals in the order of their discovery, i.e. alpha1beta1gamma1 heterotrimer is laminin 1. The biological functions of the different chains and trimer molecules are largely unknown, but some of the chains have been shown to differ with respect to their tissue distribution, presumably reflecting diverse functions in vivo. This gene encodes the beta chain isoform laminin, beta 1. The beta 1 chain has 7 structurally distinct domains which it shares with other beta chain isomers. The C-terminal helical region containing domains I and II are separated by domain alpha, domains III and V contain several EGF-like repeats, and domains IV and VI have a globular conformation. Laminin, beta 1 is expressed in most tissues that produce basement membranes, and is one of the 3 chains constituting laminin 1, the first laminin isolated from Engelbreth-Holm-Swarm (EHS) tumor. A sequence in the beta 1 chain that is involved in cell attachment, chemotaxis, and binding to the laminin receptor was identified and shown to have the capacity to inhibit metastasis. LAMB1 ENSG00000091136
4735 septin 2 NA SEPT2 ENSG00000168385
7267 tetratricopeptide repeat domain 3 NA TTC3 ENSG00000182670
151887 coiled-coil domain containing 80 NA CCDC80 ENSG00000091986
7311 ubiquitin A-52 residue ribosomal protein fusion product 1 Ubiquitin is a highly conserved nuclear and cytoplasmic protein that has a major role in targeting cellular proteins for degradation by the 26S proteosome. It is also involved in the maintenance of chromatin structure, the regulation of gene expression, and the stress response. Ubiquitin is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin moiety fused to an unrelated protein. This gene encodes a fusion protein consisting of ubiquitin at the N terminus and ribosomal protein L40 at the C terminus, a C-terminal extension protein (CEP). Multiple processed pseudogenes derived from this gene are present in the genome. UBA52 ENSG00000221983
8522 growth arrest specific 7 Growth arrest-specific 7 is expressed primarily in terminally differentiated brain cells and predominantly in mature cerebellar Purkinje neurons. GAS7 plays a putative role in neuronal development. Several transcript variants encoding proteins which vary in the N-terminus have been described. GAS7 ENSG00000007237
7805 lysosomal protein transmembrane 5 This gene encodes a transmembrane receptor that is associated with lysosomes. The encoded protein, also known as E3 protein, may play a role in hematopoiesis. LAPTM5 ENSG00000162511
9839 zinc finger E-box binding homeobox 2 The protein encoded by this gene is a member of the Zfh1 family of 2-handed zinc finger/homeodomain proteins. It is located in the nucleus and functions as a DNA-binding transcriptional repressor that interacts with activated SMADs. Mutations in this gene are associated with Hirschsprung disease/Mowat-Wilson syndrome. Alternatively spliced transcript variants have been found for this gene. ZEB2 ENSG00000169554
667 dystonin This gene encodes a member of the plakin protein family of adhesion junction plaque proteins. Multiple alternatively spliced transcript variants encoding distinct isoforms have been found for this gene, but the full-length nature of some variants has not been defined. It has been reported that some isoforms are expressed in neural and muscle tissue, anchoring neural intermediate filaments to the actin cytoskeleton, and some isoforms are expressed in epithelial tissue, anchoring keratin-containing intermediate filaments to hemidesmosomes. Consistent with the expression, mice defective for this gene show skin blistering and neurodegeneration. DST ENSG00000151914
64423 inverted formin, FH2 and WH2 domain containing This gene represents a member of the formin family of proteins. It is considered a diaphanous formin due to the presence of a diaphanous inhibitory domain located at the N-terminus of the encoded protein. Studies of a similar mouse protein indicate that the protein encoded by this locus may function in polymerization and depolymerization of actin filaments. Mutations at this locus have been associated with focal segmental glomerulosclerosis 5. INF2 ENSG00000203485
1307 collagen type XVI alpha 1 chain This gene encodes the alpha chain of type XVI collagen, a member of the FACIT collagen family (fibril-associated collagens with interrupted helices). Members of this collagen family are found in association with fibril-forming collagens such as type I and II, and serve to maintain the integrity of the extracellular matrix. High levels of type XVI collagen have been found in fibroblasts and keratinocytes, and in smooth muscle and amnion. COL16A1 ENSG00000084636
1191 clusterin The protein encoded by this gene is a secreted chaperone that can under some stress conditions also be found in the cell cytosol. It has been suggested to be involved in several basic biological events such as cell death, tumor progression, and neurodegenerative disorders. Alternate splicing results in both coding and non-coding variants. CLU ENSG00000120885
3572 interleukin 6 signal transducer The protein encoded by this gene is a signal transducer shared by many cytokines, including interleukin 6 (IL6), ciliary neurotrophic factor (CNTF), leukemia inhibitory factor (LIF), and oncostatin M (OSM). This protein functions as a part of the cytokine receptor complex. The activation of this protein is dependent upon the binding of cytokines to their receptors. vIL6, a protein related to IL6 and encoded by the Kaposi sarcoma-associated herpesvirus, can bypass the interleukin 6 receptor (IL6R) and directly activate this protein. Knockout studies in mice suggest that this gene plays a critical role in regulating myocyte apoptosis. Alternatively spliced transcript variants have been described. A related pseudogene has been identified on chromosome 17. IL6ST ENSG00000134352
10313 reticulon 3 This gene belongs to the reticulon family of highly conserved genes that are preferentially expressed in neuroendocrine tissues. This family of proteins interact with, and modulate the activity of beta-amyloid converting enzyme 1 (BACE1), and the production of amyloid-beta. An increase in the expression of any reticulon protein substantially reduces the production of amyloid-beta, suggesting that reticulon proteins are negative modulators of BACE1 in cells. Alternatively spliced transcript variants encoding different isoforms have been found for this gene, and pseudogenes of this gene are located on chromosomes 4 and 12. RTN3 ENSG00000133318
821 calnexin This gene encodes a member of the calnexin family of molecular chaperones. The encoded protein is a calcium-binding, endoplasmic reticulum (ER)-associated protein that interacts transiently with newly synthesized N-linked glycoproteins, facilitating protein folding and assembly. It may also play a central role in the quality control of protein folding by retaining incorrectly folded protein subunits within the ER for degradation. Alternatively spliced transcript variants encoding the same protein have been described. CANX ENSG00000127022
216 aldehyde dehydrogenase 1 family member A1 The protein encoded by this gene belongs to the aldehyde dehydrogenase family. Aldehyde dehydrogenase is the next enzyme after alcohol dehydrogenase in the major pathway of alcohol metabolism. There are two major aldehyde dehydrogenase isozymes in the liver, cytosolic and mitochondrial, which are encoded by distinct genes, and can be distinguished by their electrophoretic mobility, kinetic properties, and subcellular localization. This gene encodes the cytosolic isozyme. Studies in mice show that through its role in retinol metabolism, this gene may also be involved in the regulation of the metabolic responses to high-fat diet. ALDH1A1 ENSG00000165092
4060 lumican This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family that includes decorin, biglycan, fibromodulin, keratocan, epiphycan, and osteoglycin. In these bifunctional molecules, the protein moiety binds collagen fibrils and the highly charged hydrophilic glycosaminoglycans regulate interfibrillar spacings. Lumican is the major keratan sulfate proteoglycan of the cornea but is also distributed in interstitial collagenous matrices throughout the body. Lumican may regulate collagen fibril organization and circumferential growth, corneal transparency, and epithelial cell migration and tissue repair. LUM ENSG00000139329
53826 FXYD domain containing ion transport regulator 6 This gene encodes a member of the FXYD family of transmembrane proteins. This particular protein encodes phosphohippolin, which likely affects the activity of Na,K-ATPase. Multiple alternatively spliced transcript variants encoding the same protein have been described. Related pseudogenes have been identified on chromosomes 10 and X. Read-through transcripts have been observed between this locus and the downstream sodium/potassium-transporting ATPase subunit gamma (FXYD2, GeneID 486) locus. FXYD6 ENSG00000137726
3860 keratin 13 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. KRT13 ENSG00000171401
11031 RAB31, member RAS oncogene family Small GTP-binding proteins of the RAB family, such as RAB31, play essential roles in vesicle and granule targeting (Bao et al., 2002 [PubMed 11784320]). RAB31 ENSG00000168461
54541 DNA damage inducible transcript 4 NA DDIT4 ENSG00000168209
57447 NDRG family member 2 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that may play a role in neurite outgrowth. This gene may be involved in glioblastoma carcinogenesis. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. NDRG2 ENSG00000165795
54587 matrix remodeling associated 8 NA MXRA8 ENSG00000162576
10160 FERM, ARH/RhoGEF and pleckstrin domain protein 1 This gene encodes a protein containing a FERM (4.2, exrin, radixin, moesin) domain, a Dbl homology domain, and two pleckstrin homology domains. These domains are found in guanine nucleotide exchange factors and proteins that link the cytoskeleton to the cell membrane. The encoded protein functions in neurons to promote dendritic growth. Alternative splicing results in multiple transcript variants. FARP1 ENSG00000152767
7184 heat shock protein 90kDa beta family member 1 This gene encodes a member of a family of adenosine triphosphate(ATP)-metabolizing molecular chaperones with roles in stabilizing and folding other proteins. The encoded protein is localized to melanosomes and the endoplasmic reticulum. Expression of this protein is associated with a variety of pathogenic states, including tumor formation. There is a microRNA gene located within the 5’ exon of this gene. There are pseudogenes for this gene on chromosomes 1 and 15. HSP90B1 ENSG00000166598
57608 KIAA1462 NA KIAA1462 ENSG00000165757
1289 collagen type V alpha 1 This gene encodes an alpha chain for one of the low abundance fibrillar collagens. Fibrillar collagen molecules are trimers that can be composed of one or more types of alpha chains. Type V collagen is found in tissues containing type I collagen and appears to regulate the assembly of heterotypic fibers composed of both type I and type V collagen. This gene product is closely related to type XI collagen and it is possible that the collagen chains of types V and XI constitute a single collagen type with tissue-specific chain combinations. The encoded procollagen protein occurs commonly as the heterotrimer pro-alpha1(V)-pro-alpha1(V)-pro-alpha2(V). Mutations in this gene are associated with Ehlers-Danlos syndrome, types I and II. Alternative splicing of this gene results in multiple transcript variants. COL5A1 ENSG00000130635
9782 matrin 3 This gene encodes a nuclear matrix protein, which is proposed to stabilize certain messenger RNA species. Mutations of this gene are associated with distal myopathy 2, which often includes vocal cord and pharyngeal weakness. Alternatively spliced transcript variants, including read-through transcripts composed of the upstream small nucleolar RNA host gene 4 (non-protein coding) and matrin 3 gene sequence, have been identified. Pseudogenes of this gene are located on chromosomes 1 and X. MATR3 ENSG00000015479
9590 A-kinase anchoring protein 12 The A-kinase anchor proteins (AKAPs) are a group of structurally diverse proteins, which have the common function of binding to the regulatory subunit of protein kinase A (PKA) and confining the holoenzyme to discrete locations within the cell. This gene encodes a member of the AKAP family. The encoded protein is expressed in endothelial cells, cultured fibroblasts, and osteosarcoma cells. It associates with protein kinases A and C and phosphatase, and serves as a scaffold protein in signal transduction. This protein and RII PKA colocalize at the cell periphery. This protein is a cell growth-related protein. Antibodies to this protein can be produced by patients with myasthenia gravis. Alternative splicing of this gene results in two transcript variants encoding different isoforms. AKAP12 ENSG00000131016
79026 AHNAK nucleoprotein NA AHNAK ENSG00000124942
1281 collagen type III alpha 1 chain This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. COL3A1 ENSG00000168542
9902 mannose receptor C type 2 This gene encodes a member of the mannose receptor family of proteins that contain a fibronectin type II domain and multiple C-type lectin-like domains. The encoded protein plays a role in extracellular matrix remodeling by mediating the internalization and lysosomal degradation of collagen ligands. Expression of this gene may play a role in the tumorigenesis and metastasis of several malignancies including breast cancer, gliomas and metastatic bone disease. MRC2 ENSG00000011028
3672 integrin subunit alpha 1 This gene encodes the alpha 1 subunit of integrin receptors. This protein heterodimerizes with the beta 1 subunit to form a cell-surface receptor for collagen and laminin. The heterodimeric receptor is involved in cell-cell adhesion and may play a role in inflammation and fibrosis. The alpha 1 subunit contains an inserted (I) von Willebrand factor type I domain which is thought to be involved in collagen binding. ITGA1 ENSG00000213949
51312 solute carrier family 25 member 37 SLC25A37 is a solute carrier localized in the mitochondrial inner membrane. It functions as an essential iron importer for the synthesis of mitochondrial heme and iron-sulfur clusters (summary by Chen et al., 2009 [PubMed 19805291]). SLC25A37 ENSG00000147454
1284 collagen type IV alpha 2 This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. The C-terminal portion of the protein, known as canstatin, is an inhibitor of angiogenesis and tumor growth. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. COL4A2 ENSG00000134871
55544 RNA binding motif protein 38 NA RBM38 ENSG00000132819
643314 KIAA0754 NA KIAA0754 ENSG00000127603
23499 microtubule-actin crosslinking factor 1 This gene encodes a large protein containing numerous spectrin and leucine-rich repeat (LRR) domains. The encoded protein is a member of a family of proteins that form bridges between different cytoskeletal elements. This protein facilitates actin-microtubule interactions at the cell periphery and couples the microtubule network to cellular junctions. Alternative splicing results in multiple transcript variants, but the full-length nature of some of these variants has not been determined. MACF1 ENSG00000127603
7009 transmembrane BAX inhibitor motif containing 6 NA TMBIM6 ENSG00000139644
1634 decorin This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. DCN ENSG00000011465
301 annexin A1 This gene encodes a membrane-localized protein that binds phospholipids. This protein inhibits phospholipase A2 and has anti-inflammatory activity. Loss of function or expression of this gene has been detected in multiple tumors. ANXA1 ENSG00000135046
2202 EGF containing fibulin like extracellular matrix protein 1 This gene encodes a member of the fibulin family of extracellular matrix glycoproteins. Like all members of this family, the encoded protein contains tandemly repeated epidermal growth factor-like repeats followed by a C-terminus fibulin-type domain. This gene is upregulated in malignant gliomas and may play a role in the aggressive nature of these tumors. Mutations in this gene are associated with Doyne honeycomb retinal dystrophy. Alternatively spliced transcript variants that encode the same protein have been described. EFEMP1 ENSG00000115380
1282 collagen type IV alpha 1 chain This gene encodes a type IV collagen alpha protein. Type IV collagen proteins are integral components of basement membranes. This gene shares a bidirectional promoter with a paralogous gene on the opposite strand. The protein consists of an amino-terminal 7S domain, a triple-helix forming collagenous domain, and a carboxy-terminal non-collagenous domain. It functions as part of a heterotrimer and interacts with other extracellular matrix components such as perlecans, proteoglycans, and laminins. In addition, proteolytic cleavage of the non-collagenous carboxy-terminal domain results in a biologically active fragment known as arresten, which has anti-angiogenic and tumor suppressor properties. Mutations in this gene cause porencephaly, cerebrovascular disease, and renal and muscular defects. Alternative splicing results in multiple transcript variants. COL4A1 ENSG00000187498
58 actin, alpha 1, skeletal muscle The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. ACTA1 ENSG00000143632
5730 prostaglandin D2 synthase The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. PTGDS ENSG00000107317
5064 paralemmin This gene encodes a member of the paralemmin protein family. The product of this gene is a prenylated and palmitoylated phosphoprotein that associates with the cytoplasmic face of plasma membranes and is implicated in plasma membrane dynamics in neurons and other cell types. Several alternatively spliced transcript variants have been identified, but the full-length nature of only two transcript variants has been determined. PALM ENSG00000099864
9639 Rho guanine nucleotide exchange factor 10 This gene encodes a Rho guanine nucleotide exchange factor (GEF). Rho GEFs regulate the activity of small Rho GTPases by stimulating the exchange of guanine diphosphate (GDP) for guanine triphosphate (GTP) and may play a role in neural morphogenesis. Mutations in this gene are associated with slowed nerve conduction velocity (SNCV). Alternative splicing results in multiple transcript variants. ARHGEF10 ENSG00000104728
1809 dihydropyrimidinase like 3 NA DPYSL3 ENSG00000113657
813 calumenin The product of this gene is a calcium-binding protein localized in the endoplasmic reticulum (ER) and it is involved in such ER functions as protein folding and sorting. This protein belongs to a family of multiple EF-hand proteins (CERC) that include reticulocalbin, ERC-55, and Cab45 and the product of this gene. Alternatively spliced transcript variants encoding different isoforms have been identified. CALU ENSG00000128595
5327 plasminogen activator, tissue type This gene encodes tissue-type plasminogen activator, a secreted serine protease that converts the proenzyme plasminogen to plasmin, a fibrinolytic enzyme. The encoded preproprotein is proteolytically processed by plasmin or trypsin to generate heavy and light chains. These chains associate via disulfide linkages to form the heterodimeric enzyme. This enzyme plays a role in cell migration and tissue remodeling. Increased enzymatic activity causes hyperfibrinolysis, which manifests as excessive bleeding, while decreased activity leads to hypofibrinolysis, which can result in thrombosis or embolism. Alternative splicing of this gene results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. PLAT ENSG00000104368
7026 nuclear receptor subfamily 2 group F member 2 This gene encodes a member of the steroid thyroid hormone superfamily of nuclear receptors. The encoded protein is a ligand inducible transcription factor that is involved in the regulation of many different genes. Alternate splicing results in multiple transcript variants. NR2F2 ENSG00000185551
9770 Ras association domain family member 2 This gene encodes a protein that contains a Ras association domain. Similar to its cattle and sheep counterparts, this gene is located near the prion gene. Two alternatively spliced transcripts encoding the same isoform have been reported. RASSF2 ENSG00000101265
8613 phospholipid phosphatase 3 The protein encoded by this gene is a member of the phosphatidic acid phosphatase (PAP) family. PAPs convert phosphatidic acid to diacylglycerol, and function in de novo synthesis of glycerolipids as well as in receptor-activated signal transduction mediated by phospholipase D. This protein is a membrane glycoprotein localized at the cell plasma membrane. It has been shown to actively hydrolyze extracellular lysophosphatidic acid and short-chain phosphatidic acid. The expression of this gene is found to be enhanced by epidermal growth factor in Hela cells. PLPP3 ENSG00000162407
ENSG00000251322 SH3 and multiple ankyrin repeat domains 3 NA SHANK3 ENSG00000251322
9741 lysosomal protein transmembrane 4 alpha This gene encodes a protein that has four predicted transmembrane domains. The function of this gene has not yet been determined; however, studies in the mouse homolog suggest a role in the transport of small molecules across endosomal and lysosomal membranes. LAPTM4A ENSG00000068697
7077 TIMP metallopeptidase inhibitor 2 This gene is a member of the TIMP gene family. The proteins encoded by this gene family are natural inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. In addition to an inhibitory role against metalloproteinases, the encoded protein has a unique role among TIMP family members in its ability to directly suppress the proliferation of endothelial cells. As a result, the encoded protein may be critical to the maintenance of tissue homeostasis by suppressing the proliferation of quiescent tissues in response to angiogenic factors, and by inhibiting protease activity in tissues undergoing remodelling of the extracellular matrix. TIMP2 ENSG00000035862
1291 collagen type VI alpha 1 The collagens are a superfamily of proteins that play a role in maintaining the integrity of various tissues. Collagens are extracellular matrix proteins and have a triple-helical domain as their common structural element. Collagen VI is a major structural component of microfibrils. The basic structural unit of collagen VI is a heterotrimer of the alpha1(VI), alpha2(VI), and alpha3(VI) chains. The alpha2(VI) and alpha3(VI) chains are encoded by the COL6A2 and COL6A3 genes, respectively. The protein encoded by this gene is the alpha 1 subunit of type VI collagen (alpha1(VI) chain). Mutations in the genes that code for the collagen VI subunits result in the autosomal dominant disorder, Bethlem myopathy. COL6A1 ENSG00000142156
60681 FK506 binding protein 10 The protein encoded by this gene belongs to the FKBP-type peptidyl-prolyl cis/trans isomerase (PPIase) family. This protein localizes to the endoplasmic reticulum and acts as a molecular chaperone. Alternatively spliced variants encoding different isoforms have been reported, but their biological validity has not been determined. FKBP10 ENSG00000141756
6507 solute carrier family 1 member 3 This gene encodes a member of a member of a high affinity glutamate transporter family. This gene functions in the termination of excitatory neurotransmission in central nervous system. Mutations are associated with episodic ataxia, Type 6. Alternative splicing results in multiple transcript variants. SLC1A3 ENSG00000079215
7531 tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein epsilon This gene product belongs to the 14-3-3 family of proteins which mediate signal transduction by binding to phosphoserine-containing proteins. This highly conserved protein family is found in both plants and mammals, and this protein is 100% identical to the mouse ortholog. It interacts with CDC25 phosphatases, RAF1 and IRS1 proteins, suggesting its role in diverse biochemical activities related to signal transduction, such as cell division and regulation of insulin sensitivity. It has also been implicated in the pathogenesis of small cell lung cancer. Two transcript variants, one protein-coding and the other non-protein-coding, have been found for this gene. YWHAE ENSG00000108953
4703 nebulin This gene encodes nebulin, a giant protein component of the cytoskeletal matrix that coexists with the thick and thin filaments within the sarcomeres of skeletal muscle. In most vertebrates, nebulin accounts for 3 to 4% of the total myofibrillar protein. The encoded protein contains approximately 30-amino acid long modules that can be classified into 7 types and other repeated modules. Protein isoform sizes vary from 600 to 800 kD due to alternative splicing that is tissue-, species-,and developmental stage-specific. Of the 183 exons in the nebulin gene, at least 43 are alternatively spliced, although exons 143 and 144 are not found in the same transcript. Of the several thousand transcript variants predicted for nebulin, the RefSeq Project has decided to create three representative RefSeq records. Mutations in this gene are associated with recessive nemaline myopathy. NEB ENSG00000183091
6515 solute carrier family 2 member 3 NA SLC2A3 ENSG00000059804
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",6,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 7 Annotations

out <- mygene::queryMany(gene_list[7,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query symbol X_id summary name notfound
ENSG00000244734 HBB 3043 The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. hemoglobin subunit beta NA
ENSG00000197616 MYH6 4624 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. myosin, heavy chain 6, cardiac muscle, alpha NA
ENSG00000165795 NDRG2 57447 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that may play a role in neurite outgrowth. This gene may be involved in glioblastoma carcinogenesis. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. NDRG family member 2 NA
ENSG00000135821 GLUL 2752 The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. glutamate-ammonia ligase NA
ENSG00000104879 CKM 1158 The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. creatine kinase, M-type NA
ENSG00000175206 NPPA 4878 The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. natriuretic peptide A NA
ENSG00000189058 APOD 347 This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. apolipoprotein D NA
ENSG00000077522 ACTN2 88 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. actinin alpha 2 NA
ENSG00000168309 FAM107A 11170 NA family with sequence similarity 107 member A NA
ENSG00000131095 GFAP 2670 This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. glial fibrillary acidic protein NA
ENSG00000099194 SCD 6319 This gene encodes an enzyme involved in fatty acid biosynthesis, primarily the synthesis of oleic acid. The protein belongs to the fatty acid desaturase family and is an integral membrane protein located in the endoplasmic reticulum. Transcripts of approximately 3.9 and 5.2 kb, differing only by alternative polyadenlyation signals, have been detected. A gene encoding a similar enzyme is located on chromosome 4 and a pseudogene of this gene is located on chromosome 17. stearoyl-CoA desaturase NA
ENSG00000021300 PLEKHB1 58473 NA pleckstrin homology domain containing B1 NA
ENSG00000130294 KIF1A 547 The protein encoded by this gene is a member of the kinesin family and functions as an anterograde motor protein that transports membranous organelles along axonal microtubules. Mutations at this locus have been associated with spastic paraplegia-30 and hereditary sensory neuropathy IIC. Alternatively spliced transcript variants encoding distinct isoforms have been described. kinesin family member 1A NA
ENSG00000167588 GPD1 2819 This gene encodes a member of the NAD-dependent glycerol-3-phosphate dehydrogenase family. The encoded protein plays a critical role in carbohydrate and lipid metabolism by catalyzing the reversible conversion of dihydroxyacetone phosphate (DHAP) and reduced nicotine adenine dinucleotide (NADH) to glycerol-3-phosphate (G3P) and NAD+. The encoded cytosolic protein and mitochondrial glycerol-3-phosphate dehydrogenase also form a glycerol phosphate shuttle that facilitates the transfer of reducing equivalents from the cytosol to mitochondria. Mutations in this gene are a cause of transient infantile hypertriglyceridemia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. glycerol-3-phosphate dehydrogenase 1 NA
ENSG00000137801 THBS1 7057 The protein encoded by this gene is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein can bind to fibrinogen, fibronectin, laminin, type V collagen and integrins alpha-V/beta-1. This protein has been shown to play roles in platelet aggregation, angiogenesis, and tumorigenesis. thrombospondin 1 NA
ENSG00000175445 LPL 4023 LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. lipoprotein lipase NA
ENSG00000175084 DES 1674 This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. desmin NA
ENSG00000211445 GPX3 2878 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. glutathione peroxidase 3 NA
ENSG00000198467 TPM2 7169 This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. tropomyosin 2 (beta) NA
ENSG00000206172 HBA1 3039 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. hemoglobin subunit alpha 1 NA
ENSG00000106631 MYL7 58498 NA myosin light chain 7 NA
ENSG00000149925 ALDOA 226 The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10. aldolase, fructose-bisphosphate A NA
ENSG00000130203 APOE 348 The protein encoded by this gene is a major apoprotein of the chylomicron. It binds to a specific liver and peripheral cell receptor, and is essential for the normal catabolism of triglyceride-rich lipoprotein constituents. This gene maps to chromosome 19 in a cluster with the related apolipoprotein C1 and C2 genes. Mutations in this gene result in familial dysbetalipoproteinemia, or type III hyperlipoproteinemia (HLP III), in which increased plasma cholesterol and triglycerides are the consequence of impaired clearance of chylomicron and VLDL remnants. Alternative splicing results in multiple transcript variants. apolipoprotein E NA
ENSG00000112531 QKI 9444 The protein encoded by this gene is an RNA-binding protein that regulates pre-mRNA splicing, export of mRNAs from the nucleus, protein translation, and mRNA stability. The encoded protein is involved in myelinization and oligodendrocyte differentiation and may play a role in schizophrenia. Multiple transcript variants encoding different isoforms have been found for this gene. QKI, KH domain containing, RNA binding NA
ENSG00000143632 ACTA1 58 The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. actin, alpha 1, skeletal muscle NA
ENSG00000197971 MBP 4155 The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. myelin basic protein NA
ENSG00000076555 ACACB 32 Acetyl-CoA carboxylase (ACC) is a complex multifunctional enzyme system. ACC is a biotin-containing enzyme which catalyzes the carboxylation of acetyl-CoA to malonyl-CoA, the rate-limiting step in fatty acid synthesis. ACC-beta is thought to control fatty acid oxidation by means of the ability of malonyl-CoA to inhibit carnitine-palmitoyl-CoA transferase I, the rate-limiting step in fatty acid uptake and oxidation by mitochondria. ACC-beta may be involved in the regulation of fatty acid oxidation, rather than fatty acid biosynthesis. There is evidence for the presence of two ACC-beta isoforms. acetyl-CoA carboxylase beta NA
ENSG00000136717 BIN1 274 This gene encodes several isoforms of a nucleocytoplasmic adaptor protein, one of which was initially identified as a MYC-interacting protein with features of a tumor suppressor. Isoforms that are expressed in the central nervous system may be involved in synaptic vesicle endocytosis and may interact with dynamin, synaptojanin, endophilin, and clathrin. Isoforms that are expressed in muscle and ubiquitously expressed isoforms localize to the cytoplasm and nucleus and activate a caspase-independent apoptotic process. Studies in mouse suggest that this gene plays an important role in cardiac muscle development. Alternate splicing of the gene results in several transcript variants encoding different isoforms. Aberrant splice variants expressed in tumor cell lines have also been described. bridging integrator 1 NA
ENSG00000118194 TNNT2 7139 The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. troponin T2, cardiac type NA
ENSG00000140416 TPM1 7168 This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. tropomyosin 1 (alpha) NA
ENSG00000121769 FABP3 2170 The intracellular fatty acid-binding proteins (FABPs) belongs to a multigene family. FABPs are divided into at least three distinct types, namely the hepatic-, intestinal- and cardiac-type. They form 14-15 kDa proteins and are thought to participate in the uptake, intracellular metabolism and/or transport of long-chain fatty acids. They may also be responsible in the modulation of cell growth and proliferation. Fatty acid-binding protein 3 gene contains four exons and its function is to arrest growth of mammary epithelial cells. This gene is a candidate tumor suppressor gene for human breast cancer. Alternative splicing results in multiple transcript variants. fatty acid binding protein 3 NA
ENSG00000166165 CKB 1152 The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in brain as well as in other tissues, and as a heterodimer with a similar muscle isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. A pseudogene of this gene has been characterized. creatine kinase B NA
ENSG00000159251 ACTC1 70 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). actin, alpha, cardiac muscle 1 NA
ENSG00000170477 KRT4 3851 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 4 NA
ENSG00000111245 MYL2 4633 Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. myosin light chain 2 NA
ENSG00000075624 ACTB 60 This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. actin, beta NA
ENSG00000107331 ABCA2 20 The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intracellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the ABC1 subfamily. Members of the ABC1 subfamily comprise the only major ABC subfamily found exclusively in multicellular eukaryotes. This protein is highly expressed in brain tissue and may play a role in macrophage lipid metabolism and neural development. Two transcript variants encoding different isoforms have been found for this gene. ATP binding cassette subfamily A member 2 NA
ENSG00000014641 MDH1 4190 This gene encodes an enzyme that catalyzes the NAD/NADH-dependent, reversible oxidation of malate to oxaloacetate in many metabolic pathways, including the citric acid cycle. Two main isozymes are known to exist in eukaryotic cells: one is found in the mitochondrial matrix and the other in the cytoplasm. This gene encodes the cytosolic isozyme, which plays a key role in the malate-aspartate shuttle that allows malate to pass through the mitochondrial membrane to be transformed into oxaloacetate for further cellular processes. Alternatively spliced transcript variants have been found for this gene. A recent study showed that a C-terminally extended isoform is produced by use of an alternative in-frame translation termination codon via a stop codon readthrough mechanism, and that this isoform is localized in the peroxisomes. Pseudogenes have been identified on chromosomes X and 6. malate dehydrogenase 1 NA
ENSG00000117115 PADI2 11240 This gene encodes a member of the peptidyl arginine deiminase family of enzymes, which catalyze the post-translational deimination of proteins by converting arginine residues into citrullines in the presence of calcium ions. The family members have distinct substrate specificities and tissue-specific expression patterns. The type II enzyme is the most widely expressed family member. Known substrates for this enzyme include myelin basic protein in the central nervous system and vimentin in skeletal muscle and macrophages. This enzyme is thought to play a role in the onset and progression of neurodegenerative human disorders, including Alzheimer disease and multiple sclerosis, and it has also been implicated in glaucoma pathogenesis. This gene exists in a cluster with four other paralogous genes. peptidyl arginine deiminase 2 NA
ENSG00000242349 NPPA-AS1 ENSG00000242349 NA NPPA antisense RNA 1 NA
ENSG00000167468 GPX4 2879 This gene encodes a member of the glutathione peroxidase protein family. Glutathione peroxidase catalyzes the reduction of hydrogen peroxide, organic hydroperoxide, and lipid peroxides by reduced glutathione and functions in the protection of cells against oxidative damage. Human plasma glutathione peroxidase has been shown to be a selenium-containing enzyme and the UGA codon is translated into a selenocysteine. The encoded protein has been identified as a moonlighting protein based on its ability to serve dual functions as a peroxidase as well as a structural protein in mature spermatozoa. Through alternative splicing and transcription initiation, rat produces proteins that localize to the nucleus, mitochondrion, and cytoplasm. In humans, alternative transcription initiation and the cleavage sites of the mitochondrial and nuclear transit peptides need to be experimentally verified. Alternative splicing results in multiple transcript variants. glutathione peroxidase 4 NA
ENSG00000167460 TPM4 7171 This gene encodes a member of the tropomyosin family of actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosins are dimers of coiled-coil proteins that polymerize end-to-end along the major groove in most actin filaments. They provide stability to the filaments and regulate access of other actin-binding proteins. In muscle cells, they regulate muscle contraction by controlling the binding of myosin heads to the actin filament. Multiple transcript variants encoding different isoforms have been found for this gene. tropomyosin 4 NA
ENSG00000148677 ANKRD1 27063 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. ankyrin repeat domain 1 NA
ENSG00000145284 SCD5 79966 Stearoyl-CoA desaturase (SCD; EC 1.14.99.5) is an integral membrane protein of the endoplasmic reticulum that catalyzes the formation of monounsaturated fatty acids from saturated fatty acids. SCD may be a key regulator of energy metabolism with a role in obesity and dislipidemia. Four SCD isoforms, Scd1 through Scd4, have been identified in mouse. In contrast, only 2 SCD isoforms, SCD1 (MIM 604031) and SCD5, have been identified in human. SCD1 shares about 85% amino acid identity with all 4 mouse SCD isoforms, as well as with rat Scd1 and Scd2. In contrast, SCD5 shares limited homology with the rodent SCDs and appears to be unique to primates (Wang et al., 2005 [PubMed 15907797]). stearoyl-CoA desaturase 5 NA
ENSG00000122304 PRM2 5620 Protamines substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis, and are the major DNA-binding proteins in the nucleus of sperm in many vertebrates. They package the sperm DNA into a highly condensed complex in a volume less than 5% of a somatic cell nucleus. Many mammalian species have only one protamine (protamine 1); however, a few species, including human and mouse, have two. This gene encodes protamine 2, which is cleaved to give rise to a family of protamine 2 peptides. Alternatively spliced transcript variants have also been found for this gene. protamine 2 NA
ENSG00000169710 FASN 2194 The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. fatty acid synthase NA
ENSG00000152137 HSPB8 26353 The protein encoded by this gene belongs to the superfamily of small heat-shock proteins containing a conservative alpha-crystallin domain at the C-terminal part of the molecule. The expression of this gene in induced by estrogen in estrogen receptor-positive breast cancer cells, and this protein also functions as a chaperone in association with Bag3, a stimulator of macroautophagy. Thus, this gene appears to be involved in regulation of cell proliferation, apoptosis, and carcinogenesis, and mutations in this gene have been associated with different neuromuscular diseases, including Charcot-Marie-Tooth disease. heat shock protein family B (small) member 8 NA
ENSG00000155657 TTN 7273 This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. titin NA
ENSG00000145362 ANK2 287 This gene encodes a member of the ankyrin family of proteins that link the integral membrane proteins to the underlying spectrin-actin cytoskeleton. Ankyrins play key roles in activities such as cell motility, activation, proliferation, contact and the maintenance of specialized membrane domains. Most ankyrins are typically composed of three structural domains: an amino-terminal domain containing multiple ankyrin repeats; a central region with a highly conserved spectrin binding domain; and a carboxy-terminal regulatory domain which is the least conserved and subject to variation. The protein encoded by this gene is required for targeting and stability of Na/Ca exchanger 1 in cardiomyocytes. Mutations in this gene cause long QT syndrome 4 and cardiac arrhythmia syndrome. Multiple transcript variants encoding different isoforms have been described. ankyrin 2, neuronal NA
ENSG00000188536 HBA2 3040 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. hemoglobin subunit alpha 2 NA
ENSG00000256545 NA NA NA NA TRUE
ENSG00000184009 ACTG1 71 Actins are highly conserved proteins that are involved in various types of cell motility, and maintenance of the cytoskeleton. In vertebrates, three main groups of actin isoforms, alpha, beta and gamma have been identified. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton, and as mediators of internal cell motility. Actin, gamma 1, encoded by this gene, is a cytoplasmic actin found in non-muscle cells. Mutations in this gene are associated with DFNA20/26, a subtype of autosomal dominant non-syndromic sensorineural progressive hearing loss. Alternative splicing results in multiple transcript variants. actin gamma 1 NA
ENSG00000087250 MT3 4504 NA metallothionein 3 NA
ENSG00000121653 MAPK8IP1 9479 This gene encodes a regulator of the pancreatic beta-cell function. It is highly similar to JIP-1, a mouse protein known to be a regulator of c-Jun amino-terminal kinase (Mapk8). This protein has been shown to prevent MAPK8 mediated activation of transcription factors, and to decrease IL-1 beta and MAP kinase kinase 1 (MEKK1) induced apoptosis in pancreatic beta cells. This protein also functions as a DNA-binding transactivator of the glucose transporter GLUT2. RE1-silencing transcription factor (REST) is reported to repress the expression of this gene in insulin-secreting beta cells. This gene is found to be mutated in a type 2 diabetes family, and thus is thought to be a susceptibility gene for type 2 diabetes. mitogen-activated protein kinase 8 interacting protein 1 NA
ENSG00000134571 MYBPC3 4607 MYBPC3 encodes the cardiac isoform of myosin-binding protein C. Myosin-binding protein C is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. MYBPC3, the cardiac isoform, is expressed exclussively in heart muscle. Regulatory phosphorylation of the cardiac isoform in vivo by cAMP-dependent protein kinase (PKA) upon adrenergic stimulation may be linked to modulation of cardiac contraction. Mutations in MYBPC3 are one cause of familial hypertrophic cardiomyopathy. myosin binding protein C, cardiac NA
ENSG00000106624 AEBP1 165 This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. AE binding protein 1 NA
ENSG00000171401 KRT13 3860 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. keratin 13 NA
ENSG00000163209 SPRR3 6707 NA small proline rich protein 3 NA
ENSG00000189043 NDUFA4 4697 The protein encoded by this gene belongs to the complex I 9kDa subunit family. Mammalian complex I of mitochondrial respiratory chain is composed of 45 different subunits. This protein has NADH dehydrogenase activity and oxidoreductase activity. It transfers electrons from NADH to the respiratory chain. The immediate electron acceptor for the enzyme is believed to be ubiquinone. NDUFA4, mitochondrial complex associated NA
ENSG00000018625 ATP1A2 477 The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The catalytic subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes an alpha 2 subunit. Mutations in this gene result in familial basilar or hemiplegic migraines, and in a rare syndrome known as alternating hemiplegia of childhood. ATPase Na+/K+ transporting subunit alpha 2 NA
ENSG00000092054 MYH7 4625 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. myosin, heavy chain 7, cardiac muscle, beta NA
ENSG00000106366 SERPINE1 5054 This gene encodes a member of the serine proteinase inhibitor (serpin) superfamily. This member is the principal inhibitor of tissue plasminogen activator (tPA) and urokinase (uPA), and hence is an inhibitor of fibrinolysis. Defects in this gene are the cause of plasminogen activator inhibitor-1 deficiency (PAI-1 deficiency), and high concentrations of the gene product are associated with thrombophilia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. serpin family E member 1 NA
ENSG00000196616 ADH1B 125 The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. alcohol dehydrogenase 1B (class I), beta polypeptide NA
ENSG00000168280 KIF5C 3800 The protein encoded by this gene is a kinesin heavy chain subunit involved in the transport of cargo within the central nervous system. The encoded protein, which acts as a tetramer by associating with another heavy chain and two light chains, interacts with protein kinase CK2. Mutations in this gene have been associated with complex cortical dysplasia with other brain malformations-2. Two transcript variants, one protein-coding and the other non-protein coding, have been found for this gene. kinesin family member 5C NA
ENSG00000106772 PRUNE2 158471 The protein encoded by this gene belongs to the B-cell CLL/lymphoma 2 and adenovirus E1B 19 kDa interacting family, whose members play roles in many cellular processes including apotosis, cell transformation, and synaptic function. Several functions for this protein have been demonstrated including suppression of Ras homolog family member A activity, which results in reduced stress fiber formation and suppression of oncogenic cellular transformation. A high molecular weight isoform of this protein has also been shown to colocalize with Adaptor protein complex 2, beta-Adaptin and endodermal markers, suggesting an involvement in post-endocytic trafficking. In prostate cancer cells, this gene acts as a tumor suppressor and its expression is regulated by prostate cancer antigen 3, a non-protein coding gene on the opposite DNA strand in an intron of this gene. Prostate cancer antigen 3 regulates levels of this gene through formation of a double-stranded RNA that undergoes adenosine deaminase actin on RNA-dependent adenosine-to-inosine RNA editing. Alternative splicing results in multiple transcript variants. prune homolog 2 NA
ENSG00000161281 COX7A1 1346 Cytochrome c oxidase (COX), the terminal component of the mitochondrial respiratory chain, catalyzes the electron transfer from reduced cytochrome c to oxygen. This component is a heteromeric complex consisting of 3 catalytic subunits encoded by mitochondrial genes and multiple structural subunits encoded by nuclear genes. The mitochondrially-encoded subunits function in electron transfer, and the nuclear-encoded subunits may function in the regulation and assembly of the complex. This nuclear gene encodes polypeptide 1 (muscle isoform) of subunit VIIa and the polypeptide 1 is present only in muscle tissues. Other polypeptides of subunit VIIa are present in both muscle and nonmuscle tissues, and are encoded by different genes. cytochrome c oxidase subunit 7A1 NA
ENSG00000171992 SYNPO 11346 Synaptopodin is an actin-associated protein that may play a role in actin-based cell shape and motility. The name synaptopodin derives from the protein’s associations with postsynaptic densities and dendritic spines and with renal podocytes (Mundel et al., 1997 [PubMed 9314539]). synaptopodin NA
ENSG00000131771 PPP1R1B 84152 This gene encodes a bifunctional signal transduction molecule. Dopaminergic and glutamatergic receptor stimulation regulates its phosphorylation and function as a kinase or phosphatase inhibitor. As a target for dopamine, this gene may serve as a therapeutic target for neurologic and psychiatric disorders. Multiple transcript variants encoding different isoforms have been found for this gene. protein phosphatase 1 regulatory inhibitor subunit 1B NA
ENSG00000115306 SPTBN1 6711 Spectrin is an actin crosslinking and molecular scaffold protein that links the plasma membrane to the actin cytoskeleton, and functions in the determination of cell shape, arrangement of transmembrane proteins, and organization of organelles. It is composed of two antiparallel dimers of alpha- and beta- subunits. This gene is one member of a family of beta-spectrin genes. The encoded protein contains an N-terminal actin-binding domain, and 17 spectrin repeats which are involved in dimer formation. Multiple transcript variants encoding different isoforms have been found for this gene. spectrin beta, non-erythrocytic 1 NA
ENSG00000196091 MYBPC1 4604 This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. myosin binding protein C, slow type NA
ENSG00000160209 LOC105372824 105372824 NA uncharacterized LOC105372824 NA
ENSG00000160209 PDXK 8566 The protein encoded by this gene phosphorylates vitamin B6, a step required for the conversion of vitamin B6 to pyridoxal-5-phosphate, an important cofactor in intermediary metabolism. The encoded protein is cytoplasmic and probably acts as a homodimer. Alternatively spliced transcript variants have been described, but their biological validity has not been determined. pyridoxal (pyridoxine, vitamin B6) kinase NA
ENSG00000157827 FMNL2 114793 This gene encodes a formin-related protein. Formin-related proteins have been implicated in morphogenesis, cytokinesis, and cell polarity. Alternatively spliced transcript variants encoding different isoforms have been described but their full-length nature has yet to be determined. formin like 2 NA
ENSG00000089220 PEBP1 5037 This gene encodes a member of the phosphatidylethanolamine-binding family of proteins and has been shown to modulate multiple signaling pathways, including the MAP kinase (MAPK), NF-kappa B, and glycogen synthase kinase-3 (GSK-3) signaling pathways. The encoded protein can be further processed to form a smaller cleavage product, hippocampal cholinergic neurostimulating peptide (HCNP), which may be involved in neural development. This gene has been implicated in numerous human cancers and may act as a metastasis suppressor gene. Multiple pseudogenes of this gene have been identified in the genome. phosphatidylethanolamine binding protein 1 NA
ENSG00000179364 PACS2 23241 NA phosphofurin acidic cluster sorting protein 2 NA
ENSG00000078114 NEBL 10529 This gene encodes a nebulin like protein that is abundantly expressed in cardiac muscle. The encoded protein binds actin and interacts with thin filaments and Z-line associated proteins in striated muscle. This protein may be involved in cardiac myofibril assembly. A shorter isoform of this protein termed LIM nebulette is expressed in non-muscle cells and may function as a component of focal adhesion complexes. Alternate splicing results in multiple transcript variants. nebulette NA
ENSG00000266844 RP11-862L9.3 ENSG00000266844 NA NA NA
ENSG00000119927 GPAM 57678 This gene encodes a mitochondrial enzyme which prefers saturated fatty acids as its substrate for the synthesis of glycerolipids. This metabolic pathway’s first step is catalyzed by the encoded enzyme. Two forms for this enzyme exist, one in the mitochondria and one in the endoplasmic reticulum. Two alternatively spliced transcript variants have been described for this gene. glycerol-3-phosphate acyltransferase, mitochondrial NA
ENSG00000130176 CNN1 1264 NA calponin 1 NA
ENSG00000006282 SPATA20 64847 NA spermatogenesis associated 20 NA
ENSG00000107796 ACTA2 59 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. actin, alpha 2, smooth muscle, aorta NA
ENSG00000198523 PLN 5350 The protein encoded by this gene is found as a pentamer and is a major substrate for the cAMP-dependent protein kinase in cardiac muscle. The encoded protein is an inhibitor of cardiac muscle sarcoplasmic reticulum Ca(2+)-ATPase in the unphosphorylated state, but inhibition is relieved upon phosphorylation of the protein. The subsequent activation of the Ca(2+) pump leads to enhanced muscle relaxation rates, thereby contributing to the inotropic response elicited in heart by beta-agonists. The encoded protein is a key regulator of cardiac diastolic function. Mutations in this gene are a cause of inherited human dilated cardiomyopathy with refractory congestive heart failure, and also familial hypertrophic cardiomyopathy. phospholamban NA
ENSG00000197893 NRAP 4892 NA nebulin related anchoring protein NA
ENSG00000064607 SUGP2 10147 This gene encodes a member of the arginine/serine-rich family of splicing factors. The encoded protein functions in mRNA processing. Alternatively spliced transcript variants have been described. SURP and G-patch domain containing 2 NA
ENSG00000237973 MTCO1P12 ENSG00000237973 NA MT-CO1 pseudogene 12 NA
ENSG00000111640 GAPDH 2597 This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. glyceraldehyde-3-phosphate dehydrogenase NA
ENSG00000151729 SLC25A4 291 This gene is a member of the mitochondrial carrier subfamily of solute carrier protein genes. The product of this gene functions as a gated pore that translocates ADP from the cytoplasm into the mitochondrial matrix and ATP from the mitochondrial matrix into the cytoplasm. The protein forms a homodimer embedded in the inner mitochondria membrane. Mutations in this gene have been shown to result in autosomal dominant progressive external opthalmoplegia and familial hypertrophic cardiomyopathy. solute carrier family 25 member 4 NA
ENSG00000245532 NEAT1 283131 This gene produces a long non-coding RNA (lncRNA) transcribed from the multiple endocrine neoplasia locus. This lncRNA is retained in the nucleus where it forms the core structural component of the paraspeckle sub-organelles. It may act as a transcriptional regulator for numerous genes, including some genes involved in cancer progression. nuclear paraspeckle assembly transcript 1 (non-protein coding) NA
ENSG00000175646 PRM1 5619 NA protamine 1 NA
ENSG00000007237 GAS7 8522 Growth arrest-specific 7 is expressed primarily in terminally differentiated brain cells and predominantly in mature cerebellar Purkinje neurons. GAS7 plays a putative role in neuronal development. Several transcript variants encoding proteins which vary in the N-terminus have been described. growth arrest specific 7 NA
ENSG00000074800 ENO1 2023 This gene encodes alpha-enolase, one of three enolase isoenzymes found in mammals. Each isoenzyme is a homodimer composed of 2 alpha, 2 gamma, or 2 beta subunits, and functions as a glycolytic enzyme. Alpha-enolase in addition, functions as a structural lens protein (tau-crystallin) in the monomeric form. Alternative splicing of this gene results in a shorter isoform that has been shown to bind to the c-myc promoter and function as a tumor suppressor. Several pseudogenes have been identified, including one on the long arm of chromosome 1. Alpha-enolase has also been identified as an autoantigen in Hashimoto encephalopathy. enolase 1 NA
ENSG00000151552 QDPR 5860 This gene encodes the enzyme dihydropteridine reductase, which catalyzes the NADH-mediated reduction of quinonoid dihydrobiopterin. This enzyme is an essential component of the pterin-dependent aromatic amino acid hydroxylating systems. Mutations in this gene resulting in QDPR deficiency include aberrant splicing, amino acid substitutions, insertions, or premature terminations. Dihydropteridine reductase deficiency presents as atypical phenylketonuria due to insufficient production of biopterin, a cofactor for phenylalanine hydroxylase. quinoid dihydropteridine reductase NA
ENSG00000068903 SIRT2 22933 This gene encodes a member of the sirtuin family of proteins, homologs to the yeast Sir2 protein. Members of the sirtuin family are characterized by a sirtuin core domain and grouped into four classes. The functions of human sirtuins have not yet been determined; however, yeast sirtuin proteins are known to regulate epigenetic gene silencing and suppress recombination of rDNA. Studies suggest that the human sirtuins may function as intracellular regulatory proteins with mono-ADP-ribosyltransferase activity. The protein encoded by this gene is included in class I of the sirtuin family. Several transcript variants are resulted from alternative splicing of this gene. sirtuin 2 NA
ENSG00000178814 OPLAH 26873 The protein encoded by this gene acts as a homodimer, using ATP hydrolysis to catalyze the conversion of 5-oxo-L-proline to L-glutamate. Defects in this gene are a cause of 5-oxoprolinase deficiency (OPLAHD). 5-oxoprolinase (ATP-hydrolysing) NA
ENSG00000171223 JUNB 3726 NA JunB proto-oncogene, AP-1 transcription factor subunit NA
ENSG00000095321 CRAT 1384 This gene encodes carnitine acetyltransferase (CRAT), which is a key enzyme in the metabolic pathway in mitochondria, peroxisomes and endoplasmic reticulum. CRAT catalyzes the reversible transfer of acyl groups from an acyl-CoA thioester to carnitine and regulates the ratio of acylCoA/CoA in the subcellular compartments. Two transcript variants encoding different isoforms have been found for this gene. carnitine O-acetyltransferase NA
ENSG00000105290 APLP1 333 This gene encodes a member of the highly conserved amyloid precursor protein gene family. The encoded protein is a membrane-associated glycoprotein that is cleaved by secretases in a manner similar to amyloid beta A4 precursor protein cleavage. This cleavage liberates an intracellular cytoplasmic fragment that may act as a transcriptional activator. The encoded protein may also play a role in synaptic maturation during cortical development. Alternatively spliced transcript variants encoding different isoforms have been described. amyloid beta precursor like protein 1 NA
ENSG00000155980 KIF5A 3798 This gene encodes a member of the kinesin family of proteins. Members of this family are part of a multisubunit complex that functions as a microtubule motor in intracellular organelle transport. Mutations in this gene cause autosomal dominant spastic paraplegia 10. kinesin family member 5A NA
ENSG00000129538 RNASE1 6035 This gene encodes a member of the pancreatic-type of secretory ribonucleases, a subset of the ribonuclease A superfamily. The encoded endonuclease cleaves internal phosphodiester RNA bonds on the 3’-side of pyrimidine bases. It prefers poly(C) as a substrate and hydrolyzes 2’,3’-cyclic nucleotides, with a pH optimum near 8.0. The encoded protein is monomeric and more commonly acts to degrade ds-RNA over ss-RNA. Alternative splicing occurs at this locus and four transcript variants encoding the same protein have been identified. ribonuclease A family member 1, pancreatic NA
ENSG00000198125 MB 4151 This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. myoglobin NA
ENSG00000166925 TSC22D4 81628 TSC22D4 is a member of the TSC22 domain family of leucine zipper transcriptional regulators (see TSC22D3; MIM 300506) (Kester et al., 1999 [PubMed 10488076]; Fiorenza et al., 2001 [PubMed 11707329]). TSC22 domain family member 4 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",7,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 8 Annotations

out <- mygene::queryMany(gene_list[8,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id query name summary notfound
S100A9 6280 ENSG00000163220 S100 calcium binding protein A9 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. NA
CSF3R 1441 ENSG00000119535 colony stimulating factor 3 receptor The protein encoded by this gene is the receptor for colony stimulating factor 3, a cytokine that controls the production, differentiation, and function of granulocytes. The encoded protein, which is a member of the family of cytokine receptors, may also function in some cell surface adhesion or recognition processes. Alternatively spliced transcript variants have been described. Mutations in this gene are a cause of Kostmann syndrome, also known as severe congenital neutropenia. NA
IFITM2 10581 ENSG00000185201 interferon induced transmembrane protein 2 NA NA
SELL 6402 ENSG00000188404 selectin L This gene encodes a cell surface adhesion molecule that belongs to a family of adhesion/homing receptors. The encoded protein contains a C-type lectin-like domain, a calcium-binding epidermal growth factor-like domain, and two short complement-like repeats. The gene product is required for binding and subsequent rolling of leucocytes on endothelial cells, facilitating their migration into secondary lymphoid organs and inflammation sites. Single-nucleotide polymorphisms in this gene have been associated with various diseases including immunoglobulin A nephropathy. Alternatively spliced transcript variants have been found for this gene. NA
FPR1 2357 ENSG00000171051 formyl peptide receptor 1 This gene encodes a G protein-coupled receptor of mammalian phagocytic cells that is a member of the G-protein coupled receptor 1 family. The protein mediates the response of phagocytic cells to invasion of the host by microorganisms and is important in host defense and inflammation. NA
LCP1 3936 ENSG00000136167 lymphocyte cytosolic protein 1 Plastins are a family of actin-binding proteins that are conserved throughout eukaryote evolution and expressed in most tissues of higher eukaryotes. In humans, two ubiquitous plastin isoforms (L and T) have been identified. Plastin 1 (otherwise known as Fimbrin) is a third distinct plastin isoform which is specifically expressed at high levels in the small intestine. The L isoform is expressed only in hemopoietic cell lineages, while the T isoform has been found in all other normal cells of solid tissues that have replicative potential (fibroblasts, endothelial cells, epithelial cells, melanocytes, etc.). However, L-plastin has been found in many types of malignant human cells of non-hemopoietic origin suggesting that its expression is induced accompanying tumorigenesis in solid tissues. NA
MMP25 64386 ENSG00000008516 matrix metallopeptidase 25 Proteins of the matrix metalloproteinase (MMP) family are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and metastasis. Most MMPs are secreted as inactive proproteins which are activated when cleaved by extracellular proteinases. However, the protein encoded by this gene is a member of the membrane-type MMP (MT-MMP) subfamily, attached to the plasma membrane via a glycosylphosphatidyl inositol anchor. In response to bacterial infection or inflammation, the encoded protein is thought to inactivate alpha-1 proteinase inhibitor, a major tissue protectant against proteolytic enzymes released by activated neutrophils, facilitating the transendothelial migration of neutrophils to inflammatory sites. The encoded protein may also play a role in tumor invasion and metastasis through activation of MMP2. The gene has previously been referred to as MMP20 but has been renamed MMP25. NA
VNN2 8875 ENSG00000112303 vanin 2 This gene product is a member of the Vanin family of proteins that share extensive sequence similarity with each other, and also with biotinidase. The family includes secreted and membrane-associated proteins, a few of which have been reported to participate in hematopoietic cell trafficking. No biotinidase activity has been demonstrated for any of the vanin proteins, however, they possess pantetheinase activity, which may play a role in oxidative-stress response. The encoded protein is a GPI-anchored cell surface molecule that plays a role in transendothelial migration of neutrophils. This gene lies in close proximity to, and in same transcriptional orientation as two other vanin genes on chromosome 6q23-q24. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. NA
IL1R2 7850 ENSG00000115590 interleukin 1 receptor type 2 The protein encoded by this gene is a cytokine receptor that belongs to the interleukin 1 receptor family. This protein binds interleukin alpha (IL1A), interleukin beta (IL1B), and interleukin 1 receptor, type I(IL1R1/IL1RA), and acts as a decoy receptor that inhibits the activity of its ligands. Interleukin 4 (IL4) is reported to antagonize the activity of interleukin 1 by inducing the expression and release of this cytokine. This gene and three other genes form a cytokine receptor gene cluster on chromosome 2q12. Alternative splicing results in multiple transcript variants and protein isoforms. Alternative splicing produces both membrane-bound and soluble proteins. A soluble protein is also produced by proteolytic cleavage. NA
FCGR3B 2215 ENSG00000162747 Fc fragment of IgG receptor IIIb The protein encoded by this gene is a low affinity receptor for the Fc region of gamma immunoglobulins (IgG). The encoded protein acts as a monomer and can bind either monomeric or aggregated IgG. This gene may function to capture immune complexes in the peripheral circulation. Several transcript variants encoding different isoforms have been found for this gene. A highly-similar gene encoding a related protein is also found on chromosome 1. NA
C10orf54 64115 ENSG00000107738 chromosome 10 open reading frame 54 NA NA
AQP9 366 ENSG00000103569 aquaporin 9 The aquaporins are a family of water-selective membrane channels. This gene encodes a member of a subset of aquaporins called the aquaglyceroporins. This protein allows passage of a broad range of noncharged solutes and also stimulates urea transport and osmotic water permeability. This protein may also facilitate the uptake of glycerol in hepatic tissue . The encoded protein may also play a role in specialized leukocyte functions such as immunological response and bactericidal activity. Alternate splicing results in multiple transcript variants. NA
MNDA 4332 ENSG00000163563 myeloid cell nuclear differentiation antigen The myeloid cell nuclear differentiation antigen (MNDA) is detected only in nuclei of cells of the granulocyte-monocyte lineage. A 200-amino acid region of human MNDA is strikingly similar to a region in the proteins encoded by a family of interferon-inducible mouse genes, designated Ifi-201, Ifi-202, and Ifi-203, that are not regulated in a cell- or tissue-specific fashion. The 1.8-kb MNDA mRNA, which contains an interferon-stimulated response element in the 5-prime untranslated region, was significantly upregulated in human monocytes exposed to interferon alpha. MNDA is located within 2,200 kb of FCER1A, APCS, CRP, and SPTA1. In its pattern of expression and/or regulation, MNDA resembles IFI16, suggesting that these genes participate in blood cell-specific responses to interferons. NA
SERPINA1 5265 ENSG00000197249 serpin family A member 1 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. NA
S100A11 6282 ENSG00000163191 S100 calcium binding protein A11 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in motility, invasion, and tubulin polymerization. Chromosomal rearrangements and altered expression of this gene have been implicated in tumor metastasis. NA
ALPL 249 ENSG00000162551 alkaline phosphatase, liver/bone/kidney This gene encodes a member of the alkaline phosphatase family of proteins. There are at least four distinct but related alkaline phosphatases: intestinal, placental, placental-like, and liver/bone/kidney (tissue non-specific). The first three are located together on chromosome 2, while the tissue non-specific form is located on chromosome 1. The product of this gene is a membrane bound glycosylated enzyme that is not expressed in any particular tissue and is, therefore, referred to as the tissue-nonspecific form of the enzyme. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature enzyme. This enzyme may play a role in bone mineralization. Mutations in this gene have been linked to hypophosphatasia, a disorder that is characterized by hypercalcemia and skeletal defects. NA
CXCR1 3577 ENSG00000163464 C-X-C motif chemokine receptor 1 The protein encoded by this gene is a member of the G-protein-coupled receptor family. This protein is a receptor for interleukin 8 (IL8). It binds to IL8 with high affinity, and transduces the signal through a G-protein activated second messenger system. Knockout studies in mice suggested that this protein inhibits embryonic oligodendrocyte precursor migration in developing spinal cord. This gene, IL8RB, a gene encoding another high affinity IL8 receptor, as well as IL8RBP, a pseudogene of IL8RB, form a gene cluster in a region mapped to chromosome 2q33-q36. NA
MYH11 4629 ENSG00000133392 myosin, heavy chain 11, smooth muscle The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. NA
S100A8 6279 ENSG00000143546 S100 calcium binding protein A8 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and as a cytokine. Altered expression of this protein is associated with the disease cystic fibrosis. Multiple transcript variants encoding different isoforms have been found for this gene. NA
ARHGDIB 397 ENSG00000111348 Rho GDP dissociation inhibitor beta Members of the Rho (or ARH) protein family (see MIM 165390) and other Ras-related small GTP-binding proteins (see MIM 179520) are involved in diverse cellular events, including cell signaling, proliferation, cytoskeletal organization, and secretion. The GTP-binding proteins are active only in the GTP-bound state. At least 3 classes of proteins tightly regulate cycling between the GTP-bound and GDP-bound states: GTPase-activating proteins (GAPs), guanine nucleotide-releasing factors (GRFs), and GDP-dissociation inhibitors (GDIs). The GDIs, including ARHGDIB, decrease the rate of GDP dissociation from Ras-like GTPases (summary by Scherle et al., 1993 [PubMed 8356058]). NA
LAPTM5 7805 ENSG00000162511 lysosomal protein transmembrane 5 This gene encodes a transmembrane receptor that is associated with lysosomes. The encoded protein, also known as E3 protein, may play a role in hematopoiesis. NA
MYO1F 4542 ENSG00000142347 myosin IF NA NA
NCF2 4688 ENSG00000116701 neutrophil cytosolic factor 2 This gene encodes neutrophil cytosolic factor 2, the 67-kilodalton cytosolic subunit of the multi-protein NADPH oxidase complex found in neutrophils. This oxidase produces a burst of superoxide which is delivered to the lumen of the neutrophil phagosome. Mutations in this gene, as well as in other NADPH oxidase subunits, can result in chronic granulomatous disease, a disease that causes recurrent infections by catalase-positive organisms. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
FCGR2A 2212 ENSG00000143226 Fc fragment of IgG receptor IIa This gene encodes one member of a family of immunoglobulin Fc receptor genes found on the surface of many immune response cells. The protein encoded by this gene is a cell surface receptor found on phagocytic cells such as macrophages and neutrophils, and is involved in the process of phagocytosis and clearing of immune complexes. Alternative splicing results in multiple transcript variants. NA
FCGR2C 9103 ENSG00000143226 Fc fragment of IgG receptor IIc (gene/pseudogene) This gene encodes one of three members of a family of low-affinity immunoglobulin gamma Fc receptors found on the surface of many immune response cells. The encoded protein is a transmembrane glycoprotein and may be involved in phagocytosis and clearing of immune complexes. An allelic polymorphism in this gene results in both coding and non-coding variants. NA
SRGN 5552 ENSG00000122862 serglycin This gene encodes a protein best known as a hematopoietic cell granule proteoglycan. Proteoglycans stored in the secretory granules of many hematopoietic cells also contain a protease-resistant peptide core, which may be important for neutralizing hydrolytic enzymes. This encoded protein was found to be associated with the macromolecular complex of granzymes and perforin, which may serve as a mediator of granule-mediated apoptosis. Two transcript variants, only one of them protein-coding, have been found for this gene. NA
SMAP2 64744 ENSG00000084070 small ArfGAP2 NA NA
NAMPT 10135 ENSG00000105835 nicotinamide phosphoribosyltransferase This gene encodes a protein that catalyzes the condensation of nicotinamide with 5-phosphoribosyl-1-pyrophosphate to yield nicotinamide mononucleotide, one step in the biosynthesis of nicotinamide adenine dinucleotide. The protein belongs to the nicotinic acid phosphoribosyltransferase (NAPRTase) family and is thought to be involved in many important biological processes, including metabolism, stress response and aging. This gene has a pseudogene on chromosome 10. NA
SLC11A1 6556 ENSG00000018280 solute carrier family 11 member 1 This gene is a member of the solute carrier family 11 (proton-coupled divalent metal ion transporters) family and encodes a multi-pass membrane protein. The protein functions as a divalent transition metal (iron and manganese) transporter involved in iron metabolism and host resistance to certain pathogens. Mutations in this gene have been associated with susceptibility to infectious diseases such as tuberculosis and leprosy, and inflammatory diseases such as rheumatoid arthritis and Crohn disease. Alternatively spliced variants that encode different protein isoforms have been described but the full-length nature of only one has been determined. NA
FGR 2268 ENSG00000000938 FGR proto-oncogene, Src family tyrosine kinase This gene is a member of the Src family of protein tyrosine kinases (PTKs). The encoded protein contains N-terminal sites for myristylation and palmitylation, a PTK domain, and SH2 and SH3 domains which are involved in mediating protein-protein interactions with phosphotyrosine-containing and proline-rich motifs, respectively. The protein localizes to plasma membrane ruffles, and functions as a negative regulator of cell migration and adhesion triggered by the beta-2 integrin signal transduction pathway. Infection with Epstein-Barr virus results in the overexpression of this gene. Multiple alternatively spliced variants, encoding the same protein, have been identified. NA
MYL9 10398 ENSG00000101335 myosin light chain 9 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. NA
SELPLG 6404 ENSG00000110876 selectin P ligand This gene encodes a glycoprotein that functions as a high affinity counter-receptor for the cell adhesion molecules P-, E- and L- selectin expressed on myeloid cells and stimulated T lymphocytes. As such, this protein plays a critical role in leukocyte trafficking during inflammation by tethering of leukocytes to activated platelets or endothelia expressing selectins. This protein requires two post-translational modifications, tyrosine sulfation and the addition of the sialyl Lewis x tetrasaccharide (sLex) to its O-linked glycans, for its high-affinity binding activity. Aberrant expression of this gene and polymorphisms in this gene are associated with defects in the innate and adaptive immune response. Alternate splicing results in multiple transcript variants. NA
MMP9 4318 ENSG00000100985 matrix metallopeptidase 9 Proteins of the matrix metalloproteinase (MMP) family are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and metastasis. Most MMP’s are secreted as inactive proproteins which are activated when cleaved by extracellular proteinases. The enzyme encoded by this gene degrades type IV and V collagens. Studies in rhesus monkeys suggest that the enzyme is involved in IL-8-induced mobilization of hematopoietic progenitor cells from bone marrow, and murine studies suggest a role in tumor-associated tissue remodeling. NA
ALOX5AP 241 ENSG00000132965 arachidonate 5-lipoxygenase activating protein This gene encodes a protein which, with 5-lipoxygenase, is required for leukotriene synthesis. Leukotrienes are arachidonic acid metabolites which have been implicated in various types of inflammatory responses, including asthma, arthritis and psoriasis. This protein localizes to the plasma membrane. Inhibitors of its function impede translocation of 5-lipoxygenase from the cytoplasm to the cell membrane and inhibit 5-lipoxygenase activation. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. NA
HCK 3055 ENSG00000101336 HCK proto-oncogene, Src family tyrosine kinase The protein encoded by this gene is a member of the Src family of tyrosine kinases. This protein is primarily hemopoietic, particularly in cells of the myeloid and B-lymphoid lineages. It may help couple the Fc receptor to the activation of the respiratory burst. In addition, it may play a role in neutrophil migration and in the degranulation of neutrophils. Multiple isoforms with different subcellular distributions are produced due to both alternative splicing and the use of alternative translation initiation codons, including a non-AUG (CUG) codon. NA
SPI1 6688 ENSG00000066336 Spi-1 proto-oncogene This gene encodes an ETS-domain transcription factor that activates gene expression during myeloid and B-lymphoid cell development. The nuclear protein binds to a purine-rich sequence known as the PU-box found near the promoters of target genes, and regulates their expression in coordination with other transcription factors and cofactors. The protein can also regulate alternative splicing of target genes. Multiple transcript variants encoding different isoforms have been found for this gene. NA
TIMP3 7078 ENSG00000100234 TIMP metallopeptidase inhibitor 3 This gene belongs to the TIMP gene family. The proteins encoded by this gene family are inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix (ECM). Expression of this gene is induced in response to mitogenic stimulation and this netrin domain-containing protein is localized to the ECM. Mutations in this gene have been associated with the autosomal dominant disorder Sorsby’s fundus dystrophy. NA
ITGB2 3689 ENSG00000160255 integrin subunit beta 2 This gene encodes an integrin beta chain, which combines with multiple different alpha chains to form different integrin heterodimers. Integrins are integral cell-surface proteins that participate in cell adhesion as well as cell-surface mediated signalling. The encoded protein plays an important role in immune response and defects in this gene cause leukocyte adhesion deficiency. Alternative splicing results in multiple transcript variants. NA
S100A12 6283 ENSG00000163221 S100 calcium binding protein A12 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein is proposed to be involved in specific calcium-dependent signal transduction pathways and its regulatory effect on cytoskeletal components may modulate various neutrophil activities. The protein includes an antimicrobial peptide which has antibacterial activity. NA
FLOT2 2319 ENSG00000132589 flotillin 2 Caveolae are small domains on the inner cell membrane involved in vesicular trafficking and signal transduction. This gene encodes a caveolae-associated, integral membrane protein, which is thought to function in neuronal signaling. NA
XPO6 23214 ENSG00000169180 exportin 6 The protein encoded by this gene is a member of the importin-beta family. Members of this family are regulated by the GTPase Ran to mediate transport of cargo across the nuclear envelope. This protein has been shown to mediate nuclear export of profilin-actin complexes. A pseudogene of this gene is located on the long arm of chromosome 14. Alternative splicing results in multiple transcript variants that encode different protein isoforms. NA
LYZ 4069 ENSG00000090382 lysozyme This gene encodes human lysozyme, whose natural substrate is the bacterial cell wall peptidoglycan (cleaving the beta[1-4]glycosidic linkages between N-acetylmuramic acid and N-acetylglucosamine). Lysozyme is one of the antimicrobial agents found in human milk, and is also present in spleen, lung, kidney, white blood cells, plasma, saliva, and tears. The protein has antibacterial activity against a number of bacterial species. Missense mutations in this gene have been identified in heritable renal amyloidosis. NA
FCER1G 2207 ENSG00000158869 Fc fragment of IgE receptor Ig The high affinity IgE receptor is a key molecule involved in allergic reactions. It is a tetramer composed of 1 alpha, 1 beta, and 2 gamma chains. The gamma chains are also subunits of other Fc receptors. NA
PGD 5226 ENSG00000142657 phosphogluconate dehydrogenase 6-phosphogluconate dehydrogenase is the second dehydrogenase in the pentose phosphate shunt. Deficiency of this enzyme is generally asymptomatic, and the inheritance of this disorder is autosomal dominant. Hemolysis results from combined deficiency of 6-phosphogluconate dehydrogenase and 6-phosphogluconolactonase suggesting a synergism of the two enzymopathies. Several transcript variants encoding different isoforms have been found for this gene. NA
HLA-C 3107 ENSG00000204525 major histocompatibility complex, class I, C HLA-C belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domain, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Over one hundred HLA-C alleles have been described NA
LILRA5 353514 ENSG00000187116 leukocyte immunoglobulin like receptor A5 The protein encoded by this gene is a member of the leukocyte immunoglobulin-like receptor (LIR) family. LIR family members are known to have activating and inibitory functions in leukocytes. Crosslink of this receptor protein on the surface of monocytes has been shown to induce calcium flux and secretion of several proinflammatory cytokines, which suggests the roles of this protein in triggering innate immune responses. This gene is one of the leukocyte receptor genes that form a gene cluster on the chromosomal region 19q13.4. Four alternatively spliced transcript variants encoding distinct isoforms have been described. NA
ITGAX 3687 ENSG00000140678 integrin subunit alpha X This gene encodes the integrin alpha X chain protein. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain. This protein combines with the beta 2 chain (ITGB2) to form a leukocyte-specific integrin referred to as inactivated-C3b (iC3b) receptor 4 (CR4). The alpha X beta 2 complex seems to overlap the properties of the alpha M beta 2 integrin in the adherence of neutrophils and monocytes to stimulated endothelium cells, and in the phagocytosis of complement coated particles. Two transcript variants encoding different isoforms have been found for this gene. NA
CALD1 800 ENSG00000122786 caldesmon 1 This gene encodes a calmodulin- and actin-binding protein that plays an essential role in the regulation of smooth muscle and nonmuscle contraction. The conserved domain of this protein possesses the binding activities to Ca(2+)-calmodulin, actin, tropomyosin, myosin, and phospholipids. This protein is a potent inhibitor of the actin-tropomyosin activated myosin MgATPase, and serves as a mediating factor for Ca(2+)-dependent inhibition of smooth muscle contraction. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. NA
HK3 3101 ENSG00000160883 hexokinase 3 Hexokinases phosphorylate glucose to produce glucose-6-phosphate, the first step in most glucose metabolism pathways. This gene encodes hexokinase 3. Similar to hexokinases 1 and 2, this allosteric enzyme is inhibited by its product glucose-6-phosphate. NA
HCLS1 3059 ENSG00000180353 hematopoietic cell-specific Lyn substrate 1 NA NA
CORO1A 11151 ENSG00000102879 coronin 1A This gene encodes a member of the WD repeat protein family. WD repeats are minimally conserved regions of approximately 40 amino acids typically bracketed by gly-his and trp-asp (GH-WD), which may facilitate formation of heterotrimeric or multiprotein complexes. Members of this family are involved in a variety of cellular processes, including cell cycle progression, signal transduction, apoptosis, and gene regulation. Alternative splicing results in multiple transcript variants. A related pseudogene has been defined on chromosome 16. NA
HBB 3043 ENSG00000244734 hemoglobin subunit beta The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. NA
CD53 963 ENSG00000143119 CD53 molecule The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. This encoded protein is a cell surface glycoprotein that is known to complex with integrins. It contributes to the transduction of CD2-generated signals in T cells and natural killer cells and has been suggested to play a role in growth regulation. Familial deficiency of this gene has been linked to an immunodeficiency associated with recurrent infectious diseases caused by bacteria, fungi and viruses. Alternative splicing results in multiple transcript variants. NA
CST7 8530 ENSG00000077984 cystatin F The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and the kininogens. The type 2 cystatin proteins are a class of cysteine proteinase inhibitors found in a variety of human fluids and secretions. This gene encodes a glycosylated cysteine protease inhibitor with a putative role in immune regulation through inhibition of a unique target in the hematopoietic system. Expression of the protein has been observed in various human cancer cell lines established from malignant tumors. NA
GPX3 2878 ENSG00000211445 glutathione peroxidase 3 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. NA
SORL1 6653 ENSG00000137642 sortilin-related receptor, L(DLR class) A repeats containing This gene encodes a mosaic protein that belongs to at least two families: the vacuolar protein sorting 10 (VPS10) domain-containing receptor family, and the low density lipoprotein receptor (LDLR) family. The encoded protein also contains fibronectin type III repeats and an epidermal growth factor repeat. The encoded preproprotein is proteolytically processed to generate the mature receptor, which likely plays roles in endocytosis and sorting. Mutations in this gene may be associated with Alzheimer’s disease. NA
NA NA ENSG00000259716 NA NA TRUE
DCN 1634 ENSG00000011465 decorin This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. NA
ARRB2 409 ENSG00000141480 arrestin beta 2 Members of arrestin/beta-arrestin protein family are thought to participate in agonist-mediated desensitization of G-protein-coupled receptors and cause specific dampening of cellular responses to stimuli such as hormones, neurotransmitters, or sensory signals. Arrestin beta 2, like arrestin beta 1, was shown to inhibit beta-adrenergic receptor function in vitro. It is expressed at high levels in the central nervous system and may play a role in the regulation of synaptic receptors. Besides the brain, a cDNA for arrestin beta 2 was isolated from thyroid gland, and thus it may also be involved in hormone-specific desensitization of TSH receptors. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
GCA 25801 ENSG00000115271 grancalcin This gene product, grancalcin, is a calcium-binding protein abundant in neutrophils and macrophages. It belongs to the penta-EF-hand subfamily of proteins which includes sorcin, calpain, and ALG-2. Grancalcin localization is dependent upon calcium and magnesium. In the absence of divalent cation, grancalcin localizes to the cytosolic fraction; with magnesium alone, it partitions with the granule fraction; and in the presence of magnesium and calcium, it associates with both the granule and membrane fractions, suggesting a role for grancalcin in granule-membrane fusion and degranulation. NA
DOK3 79930 ENSG00000146094 docking protein 3 NA NA
NCF4 4689 ENSG00000100365 neutrophil cytosolic factor 4 The protein encoded by this gene is a cytosolic regulatory component of the superoxide-producing phagocyte NADPH-oxidase, a multicomponent enzyme system important for host defense. This protein is preferentially expressed in cells of myeloid lineage. It interacts primarily with neutrophil cytosolic factor 2 (NCF2/p67-phox) to form a complex with neutrophil cytosolic factor 1 (NCF1/p47-phox), which further interacts with the small G protein RAC1 and translocates to the membrane upon cell stimulation. This complex then activates flavocytochrome b, the membrane-integrated catalytic core of the enzyme system. The PX domain of this protein can bind phospholipid products of the PI(3) kinase, which suggests its role in PI(3) kinase-mediated signaling events. The phosphorylation of this protein was found to negatively regulate the enzyme activity. Alternatively spliced transcript variants encoding distinct isoforms have been observed. NA
TNS1 7145 ENSG00000079308 tensin 1 The protein encoded by this gene localizes to focal adhesions, regions of the plasma membrane where the cell attaches to the extracellular matrix. This protein crosslinks actin filaments and contains a Src homology 2 (SH2) domain, which is often found in molecules involved in signal transduction. This protein is a substrate of calpain II. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
ICAM3 3385 ENSG00000076662 intercellular adhesion molecule 3 The protein encoded by this gene is a member of the intercellular adhesion molecule (ICAM) family. All ICAM proteins are type I transmembrane glycoproteins, contain 2-9 immunoglobulin-like C2-type domains, and bind to the leukocyte adhesion LFA-1 protein. This protein is constitutively and abundantly expressed by all leucocytes and may be the most important ligand for LFA-1 in the initiation of the immune response. It functions not only as an adhesion molecule, but also as a potent signalling molecule. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
PYGL 5836 ENSG00000100504 phosphorylase, glycogen, liver This gene encodes a homodimeric protein that catalyses the cleavage of alpha-1,4-glucosidic bonds to release glucose-1-phosphate from liver glycogen stores. This protein switches from inactive phosphorylase B to active phosphorylase A by phosphorylation of serine residue 15. Activity of this enzyme is further regulated by multiple allosteric effectors and hormonal controls. Humans have three glycogen phosphorylase genes that encode distinct isozymes that are primarily expressed in liver, brain and muscle, respectively. The liver isozyme serves the glycemic demands of the body in general while the brain and muscle isozymes supply just those tissues. In glycogen storage disease type VI, also known as Hers disease, mutations in liver glycogen phosphorylase inhibit the conversion of glycogen to glucose and results in moderate hypoglycemia, mild ketosis, growth retardation and hepatomegaly. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
HLA-B 3106 ENSG00000234745 major histocompatibility complex, class I, B HLA-B belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon 1 encodes the leader peptide, exon 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Hundreds of HLA-B alleles have been described. NA
RAC2 5880 ENSG00000128340 ras-related C3 botulinum toxin substrate 2 (rho family, small GTP binding protein Rac2) This gene encodes a member of the Ras superfamily of small guanosine triphosphate (GTP)-metabolizing proteins. The encoded protein localizes to the plasma membrane, where it regulates diverse processes, such as secretion, phagocytosis, and cell polarization. Activity of this protein is also involved in the generation of reactive oxygen species. Mutations in this gene are associated with neutrophil immunodeficiency syndrome. There is a pseudogene for this gene on chromosome 6. NA
TYROBP 7305 ENSG00000011600 TYRO protein tyrosine kinase binding protein This gene encodes a transmembrane signaling polypeptide which contains an immunoreceptor tyrosine-based activation motif (ITAM) in its cytoplasmic domain. The encoded protein may associate with the killer-cell inhibitory receptor (KIR) family of membrane glycoproteins and may act as an activating signal transduction element. This protein may bind zeta-chain (TCR) associated protein kinase 70kDa (ZAP-70) and spleen tyrosine kinase (SYK) and play a role in signal transduction, bone modeling, brain myelination, and inflammation. Mutations within this gene have been associated with polycystic lipomembranous osteodysplasia with sclerosing leukoencephalopathy (PLOSL), also known as Nasu-Hakola disease. Its putative receptor, triggering receptor expressed on myeloid cells 2 (TREM2), also causes PLOSL. Multiple alternative transcript variants encoding distinct isoforms have been identified for this gene. NA
ABTB1 80325 ENSG00000114626 ankyrin repeat and BTB domain containing 1 This gene encodes a protein with an ankyrin repeat region and two BTB/POZ domains, which are thought to be involved in protein-protein interactions. Expression of this gene is activated by the phosphatase and tensin homolog, a tumor suppressor. Alternate splicing results in three transcript variants. NA
HLA-E 3133 ENSG00000204592 major histocompatibility complex, class I, E HLA-E belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. HLA-E binds a restricted subset of peptides derived from the leader peptides of other class I molecules. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. NA
ACSL1 2180 ENSG00000151726 acyl-CoA synthetase long-chain family member 1 The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. Several transcript variants encoding different isoforms have been found for this gene. NA
MSRB1 51734 ENSG00000198736 methionine sulfoxide reductase B1 This gene encodes a selenoprotein, which contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon that normally signals translation termination. The 3’ UTR of selenoprotein genes have a common stem-loop structure, the sec insertion sequence (SECIS), that is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. This protein belongs to the methionine sulfoxide reductase (Msr) protein family which includes repair enzymes that reduce oxidized methionine residues in proteins. The protein encoded by this gene is expressed in a variety of adult and fetal tissues and localizes to the cell nucleus and cytosol. NA
GSN 2934 ENSG00000148180 gelsolin The protein encoded by this gene binds to the ‘plus’ ends of actin monomers and filaments to prevent monomer exchange. The encoded calcium-regulated protein functions in both assembly and disassembly of actin filaments. Defects in this gene are a cause of familial amyloidosis Finnish type (FAF). Multiple transcript variants encoding several different isoforms have been found for this gene. NA
GPSM3 63940 ENSG00000213654 G-protein signaling modulator 3 NA NA
FAM65B 9750 ENSG00000111913 family with sequence similarity 65 member B The protein encoded by this gene stimulates the formation of a non-mitotic multinucleate syncytium from proliferative cytotrophoblasts during trophoblast differentiation. Alternative splicing of this gene results in multiple transcript variants. NA
SERPINB1 1992 ENSG00000021355 serpin family B member 1 The protein encoded by this gene is a member of the serpin family of proteinase inhibitors. Members of this family maintain homeostasis by neutralizing overexpressed proteinase activity through their function as suicide substrates. This protein inhibits the neutrophil-derived proteinases neutrophil elastase, cathepsin G, and proteinase-3 and thus protects tissues from damage at inflammatory sites. Alternative splicing results in multiple transcript variants. NA
COL6A2 1292 ENSG00000142173 collagen type VI alpha 2 This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. NA
CD177 57126 ENSG00000204936 CD177 molecule This gene encodes a glycosyl-phosphatidylinositol (GPI)-linked cell surface glycoprotein that plays a role in neutrophil activation. The protein can bind platelet endothelial cell adhesion molecule-1 and function in neutrophil transmigration. Mutations in this gene are associated with myeloproliferative diseases. Over-expression of this gene has been found in patients with polycythemia rubra vera. Autoantibodies against the protein may result in pulmonary transfusion reactions, and it may be involved in Wegener’s granulomatosis. A related pseudogene, which is adjacent to this gene on chromosome 19, has been identified. NA
PTRF 284119 ENSG00000177469 polymerase I and transcript release factor This gene encodes a protein that enables the dissociation of paused ternary polymerase I transcription complexes from the 3’ end of pre-rRNA transcripts. This protein regulates rRNA transcription by promoting the dissociation of transcription complexes and the reinitiation of polymerase I on nascent rRNA transcripts. This protein also localizes to caveolae at the plasma membrane and is thought to play a critical role in the formation of caveolae and the stabilization of caveolins. This protein translocates from caveolae to the cytoplasm after insulin stimulation. Caveolae contain truncated forms of this protein and may be the site of phosphorylation-dependent proteolysis. This protein is also thought to modify lipid metabolism and insulin-regulated gene expression. Mutations in this gene result in a disorder characterized by generalized lipodystrophy and muscular dystrophy. NA
COTL1 23406 ENSG00000103187 coactosin like F-actin binding protein 1 This gene encodes one of the numerous actin-binding proteins which regulate the actin cytoskeleton. This protein binds F-actin, and also interacts with 5-lipoxygenase, which is the first committed enzyme in leukotriene biosynthesis. Although this gene has been reported to map to chromosome 17 in the Smith-Magenis syndrome region, the best alignments for this gene are to chromosome 16. The Smith-Magenis syndrome region is the site of two related pseudogenes. NA
TNFRSF10C 8794 ENSG00000173535 tumor necrosis factor receptor superfamily member 10c The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptor contains an extracellular TRAIL-binding domain and a transmembrane domain, but no cytoplasmic death domain. This receptor is not capable of inducing apoptosis, and is thought to function as an antagonistic receptor that protects cells from TRAIL-induced apoptosis. This gene was found to be a p53-regulated DNA damage-inducible gene. The expression of this gene was detected in many normal tissues but not in most cancer cell lines, which may explain the specific sensitivity of cancer cells to the apoptosis-inducing activity of TRAIL. NA
LITAF 9516 ENSG00000189067 lipopolysaccharide induced TNF factor Lipopolysaccharide is a potent stimulator of monocytes and macrophages, causing secretion of tumor necrosis factor-alpha (TNF-alpha) and other inflammatory mediators. This gene encodes lipopolysaccharide-induced TNF-alpha factor, which is a DNA-binding protein and can mediate the TNF-alpha expression by direct binding to the promoter region of the TNF-alpha gene. The transcription of this gene is induced by tumor suppressor p53 and has been implicated in the p53-induced apoptotic pathway. Mutations in this gene cause Charcot-Marie-Tooth disease type 1C (CMT1C) and may be involved in the carcinogenesis of extramammary Paget’s disease (EMPD). Multiple alternatively spliced transcript variants have been found for this gene. NA
TLR2 7097 ENSG00000137462 toll like receptor 2 The protein encoded by this gene is a member of the Toll-like receptor (TLR) family which plays a fundamental role in pathogen recognition and activation of innate immunity. TLRs are highly conserved from Drosophila to humans and share structural and functional similarities. This protein is a cell-surface protein that can form heterodimers with other TLR family members to recognize conserved molecules derived from microorganisms known as pathogen-associated molecular patterns (PAMPs). Activation of TLRs by PAMPs leads to an up-regulation of signaling pathways to modulate the host’s inflammatory response. This gene is also thought to promote apoptosis in response to bacterial lipoproteins. This gene has been implicated in the pathogenesis of several autoimmune diseases. Alternative splicing results in multiple transcript variants. NA
RHOG 391 ENSG00000177105 ras homolog family member G This gene encodes a member of the Rho family of small GTPases, which cycle between inactive GDP-bound and active GTP-bound states and function as molecular switches in signal transduction cascades. Rho proteins promote reorganization of the actin cytoskeleton and regulate cell shape, attachment, and motility. The encoded protein facilitates translocation of a functional guanine nucleotide exchange factor (GEF) complex from the cytoplasm to the plasma membrane where ras-related C3 botulinum toxin substrate 1 is activated to promote lamellipodium formation and cell migration. Two related pseudogene have been identified on chromosomes 20 and X. NA
IL18RAP 8807 ENSG00000115607 interleukin 18 receptor accessory protein The protein encoded by this gene is an accessory subunit of the heterodimeric receptor for interleukin 18 (IL18), a proinflammatory cytokine involved in inducing cell-mediated immunity. This protein enhances the IL18-binding activity of the IL18 receptor and plays a role in signaling by IL18. Mutations in this gene are associated with Crohn’s disease and inflammatory bowel disease, and susceptibility to celiac disease and leprosy. Alternatively spliced transcript variants of this gene have been described, but their full-length nature is not known. NA
HSPG2 3339 ENSG00000142798 heparan sulfate proteoglycan 2 This gene encodes the perlecan protein, which consists of a core protein to which three long chains of glycosaminoglycans (heparan sulfate or chondroitin sulfate) are attached. The perlecan protein is a large multidomain proteoglycan that binds to and cross-links many extracellular matrix components and cell-surface molecules. It has been shown that this protein interacts with laminin, prolargin, collagen type IV, FGFBP1, FBLN2, FGF7 and transthyretin, etc., and it plays essential roles in multiple biological activities. Perlecan is a key component of the vascular extracellular matrix, where it helps to maintain the endothelial barrier function. It is a potent inhibitor of smooth muscle cell proliferation and is thus thought to help maintain vascular homeostasis. It can also promote growth factor (e.g., FGF2) activity and thus stimulate endothelial growth and re-generation. It is a major component of basement membranes, where it is involved in the stabilization of other molecules as well as being involved with glomerular permeability to macromolecules and cell adhesion. Mutations in this gene cause Schwartz-Jampel syndrome type 1, Silverman-Handmaker type of dyssegmental dysplasia, and tardive dyskinesia. Alternative splicing of this gene results in multiple transcript variants. NA
AHNAK 79026 ENSG00000124942 AHNAK nucleoprotein NA NA
BASP1 10409 ENSG00000176788 brain abundant membrane attached signal protein 1 This gene encodes a membrane bound protein with several transient phosphorylation sites and PEST motifs. Conservation of proteins with PEST sequences among different species supports their functional significance. PEST sequences typically occur in proteins with high turnover rates. Immunological characteristics of this protein are species specific. This protein also undergoes N-terminal myristoylation. Alternative splicing results in multiple transcript variants that encode the same protein. NA
PLBD1 79887 ENSG00000121316 phospholipase B domain containing 1 NA NA
NBEAL2 23218 ENSG00000160796 neurobeachin like 2 The protein encoded by this gene contains a beige and Chediak-Higashi (BEACH) domain and multiple WD40 domains, and may play a role in megakaryocyte alpha-granule biogenesis. Mutations in this gene are a cause of gray platelet syndrome. NA
RASSF2 9770 ENSG00000101265 Ras association domain family member 2 This gene encodes a protein that contains a Ras association domain. Similar to its cattle and sheep counterparts, this gene is located near the prion gene. Two alternatively spliced transcripts encoding the same isoform have been reported. NA
MMP25-AS1 ENSG00000261971 ENSG00000261971 MMP25 antisense RNA 1 NA NA
LRG1 116844 ENSG00000171236 leucine rich alpha-2-glycoprotein 1 The leucine-rich repeat (LRR) family of proteins, including LRG1, have been shown to be involved in protein-protein interaction, signal transduction, and cell adhesion and development. LRG1 is expressed during granulocyte differentiation (O’Donnell et al., 2002 [PubMed 12223515]). NA
SLA 6503 ENSG00000155926 Src-like-adaptor NA NA
SHKBP1 92799 ENSG00000160410 SH3KBP1 binding protein 1 NA NA
SERPING1 710 ENSG00000149131 serpin family G member 1 This gene encodes a highly glycosylated plasma protein involved in the regulation of the complement cascade. Its protein inhibits activated C1r and C1s of the first complement component and thus regulates complement activation. Deficiency of this protein is associated with hereditary angioneurotic oedema (HANE). Alternative splicing results in multiple transcript variants encoding the same isoform. NA
MYLK 4638 ENSG00000065534 myosin light chain kinase This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. NA
ITM2B 9445 ENSG00000136156 integral membrane protein 2B Amyloid precursor proteins are processed by beta-secretase and gamma-secretase to produce beta-amyloid peptides which form the characteristic plaques of Alzheimer disease. This gene encodes a transmembrane protein which is processed at the C-terminus by furin or furin-like proteases to produce a small secreted peptide which inhibits the deposition of beta-amyloid. Mutations which result in extension of the C-terminal end of the encoded protein, thereby increasing the size of the secreted peptide, are associated with two neurogenerative diseases, familial British dementia and familial Danish dementia. NA
RHOB 388 ENSG00000143878 ras homolog family member B NA NA
CAP1 10487 ENSG00000131236 CAP, adenylate cyclase-associated protein 1 (yeast) The protein encoded by this gene is related to the S. cerevisiae CAP protein, which is involved in the cyclic AMP pathway. The human protein is able to interact with other molecules of the same protein, as well as with CAP2 and actin. Alternatively spliced transcript variants have been identified. NA
TALDO1 6888 ENSG00000177156 transaldolase 1 Transaldolase 1 is a key enzyme of the nonoxidative pentose phosphate pathway providing ribose-5-phosphate for nucleic acid synthesis and NADPH for lipid biosynthesis. This pathway can also maintain glutathione at a reduced state and thus protect sulfhydryl groups and cellular integrity from oxygen radicals. The functional gene of transaldolase 1 is located on chromosome 11 and a pseudogene is identified on chromosome 1 but there are conflicting map locations. The second and third exon of this gene were developed by insertion of a retrotransposable element. This gene is thought to be involved in multiple sclerosis. NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",8,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 9 Annotations

out <- mygene::queryMany(gene_list[9,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query X_id name summary symbol
ENSG00000175084 1674 desmin This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. DES
ENSG00000019582 972 CD74 molecule The protein encoded by this gene associates with class II major histocompatibility complex (MHC) and is an important chaperone that regulates antigen presentation for immune response. It also serves as cell surface receptor for the cytokine macrophage migration inhibitory factor (MIF) which, when bound to the encoded protein, initiates survival pathways and cell proliferation. This protein also interacts with amyloid precursor protein (APP) and suppresses the production of amyloid beta (Abeta). Multiple alternatively spliced transcript variants encoding different isoforms have been identified. CD74
ENSG00000197971 4155 myelin basic protein The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. MBP
ENSG00000166710 567 beta-2-microglobulin This gene encodes a serum protein found in association with the major histocompatibility complex (MHC) class I heavy chain on the surface of nearly all nucleated cells. The protein has a predominantly beta-pleated sheet structure that can form amyloid fibrils in some pathological conditions. The encoded antimicrobial protein displays antibacterial activity in amniotic fluid. A mutation in this gene has been shown to result in hypercatabolic hypoproteinemia. B2M
ENSG00000244734 3043 hemoglobin subunit beta The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. HBB
ENSG00000204287 3122 major histocompatibility complex, class II, DR alpha HLA-DRA is one of the HLA class II alpha chain paralogues. This class II molecule is a heterodimer consisting of an alpha and a beta chain, both anchored in the membrane. It plays a central role in the immune system by presenting peptides derived from extracellular proteins. Class II molecules are expressed in antigen presenting cells (APC: B lymphocytes, dendritic cells, macrophages). The alpha chain is approximately 33-35 kDa and its gene contains 5 exons. Exon 1 encodes the leader peptide, exons 2 and 3 encode the two extracellular domains, and exon 4 encodes the transmembrane domain and the cytoplasmic tail. DRA does not have polymorphisms in the peptide binding part and acts as the sole alpha chain for DRB1, DRB3, DRB4 and DRB5. HLA-DRA
ENSG00000198467 7169 tropomyosin 2 (beta) This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. TPM2
ENSG00000204592 3133 major histocompatibility complex, class I, E HLA-E belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. HLA-E binds a restricted subset of peptides derived from the leader peptides of other class I molecules. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. HLA-E
ENSG00000128591 2318 filamin C This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. FLNC
ENSG00000184009 71 actin gamma 1 Actins are highly conserved proteins that are involved in various types of cell motility, and maintenance of the cytoskeleton. In vertebrates, three main groups of actin isoforms, alpha, beta and gamma have been identified. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton, and as mediators of internal cell motility. Actin, gamma 1, encoded by this gene, is a cytoplasmic actin found in non-muscle cells. Mutations in this gene are associated with DFNA20/26, a subtype of autosomal dominant non-syndromic sensorineural progressive hearing loss. Alternative splicing results in multiple transcript variants. ACTG1
ENSG00000188536 3040 hemoglobin subunit alpha 2 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. HBA2
ENSG00000204983 5644 protease, serine 1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. PRSS1
ENSG00000091704 1357 carboxypeptidase A1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. CPA1
ENSG00000204525 3107 major histocompatibility complex, class I, C HLA-C belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domain, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Over one hundred HLA-C alleles have been described HLA-C
ENSG00000234745 3106 major histocompatibility complex, class I, B HLA-B belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon 1 encodes the leader peptide, exon 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Hundreds of HLA-B alleles have been described. HLA-B
ENSG00000172403 171024 synaptopodin 2 NA SYNPO2
ENSG00000231389 3113 major histocompatibility complex, class II, DP alpha 1 HLA-DPA1 belongs to the HLA class II alpha chain paralogues. This class II molecule is a heterodimer consisting of an alpha (DPA) and a beta (DPB) chain, both anchored in the membrane. It plays a central role in the immune system by presenting peptides derived from extracellular proteins. Class II molecules are expressed in antigen presenting cells (APC: B lymphocytes, dendritic cells, macrophages). The alpha chain is approximately 33-35 kDa and its gene contains 5 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the two extracellular domains, exon 4 encodes the transmembrane domain and the cytoplasmic tail. Within the DP molecule both the alpha chain and the beta chain contain the polymorphisms specifying the peptide binding specificities, resulting in up to 4 different molecules. HLA-DPA1
ENSG00000169347 2813 glycoprotein 2 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. GP2
ENSG00000100316 6122 ribosomal protein L3 Ribosomes, the complexes that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L3P family of ribosomal proteins and it is located in the cytoplasm. The protein can bind to the HIV-1 TAR mRNA, and it has been suggested that the protein contributes to tat-mediated transactivation. This gene is co-transcribed with several small nucleolar RNA genes, which are located in several of this gene’s introns. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. RPL3
ENSG00000137154 6194 ribosomal protein S6 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a cytoplasmic ribosomal protein that is a component of the 40S subunit. The protein belongs to the S6E family of ribosomal proteins. It is the major substrate of protein kinases in the ribosome, with subsets of five C-terminal serine residues phosphorylated by different protein kinases. Phosphorylation is induced by a wide range of stimuli, including growth factors, tumor-promoting agents, and mitogens. Dephosphorylation occurs at growth arrest. The protein may contribute to the control of cell growth and proliferation through the selective translation of particular classes of mRNA. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. RPS6
ENSG00000155657 7273 titin This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. TTN
ENSG00000175535 5406 pancreatic lipase This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. PNLIP
ENSG00000140416 7168 tropomyosin 1 (alpha) This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. TPM1
ENSG00000166165 1152 creatine kinase B The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in brain as well as in other tissues, and as a heterodimer with a similar muscle isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. A pseudogene of this gene has been characterized. CKB
ENSG00000142789 10136 chymotrypsin like elastase family member 3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. CELA3A
ENSG00000170835 1056 carboxyl ester lipase The protein encoded by this gene is a glycoprotein secreted from the pancreas into the digestive tract and from the lactating mammary gland into human milk. The physiological role of this protein is in cholesterol and lipid-soluble vitamin ester hydrolysis and absorption. This encoded protein promotes large chylomicron production in the intestine. Also its presence in plasma suggests its interactions with cholesterol and oxidized lipoproteins to modulate the progression of atherosclerosis. In pancreatic tumoral cells, this encoded protein is thought to be sequestrated within the Golgi compartment and is probably not secreted. This gene contains a variable number of tandem repeat (VNTR) polymorphism in the coding region that may influence the function of the encoded protein. CEL
ENSG00000122862 5552 serglycin This gene encodes a protein best known as a hematopoietic cell granule proteoglycan. Proteoglycans stored in the secretory granules of many hematopoietic cells also contain a protease-resistant peptide core, which may be important for neutralizing hydrolytic enzymes. This encoded protein was found to be associated with the macromolecular complex of granzymes and perforin, which may serve as a mediator of granule-mediated apoptosis. Two transcript variants, only one of them protein-coding, have been found for this gene. SRGN
ENSG00000153002 1360 carboxypeptidase B1 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. CPB1
ENSG00000196126 3123 major histocompatibility complex, class II, DR beta 1 HLA-DRB1 belongs to the HLA class II beta chain paralogs. The class II molecule is a heterodimer consisting of an alpha (DRA) and a beta chain (DRB), both anchored in the membrane. It plays a central role in the immune system by presenting peptides derived from extracellular proteins. Class II molecules are expressed in antigen presenting cells (APC: B lymphocytes, dendritic cells, macrophages). The beta chain is approximately 26-28 kDa. It is encoded by 6 exons. Exon one encodes the leader peptide; exons 2 and 3 encode the two extracellular domains; exon 4 encodes the transmembrane domain; and exon 5 encodes the cytoplasmic tail. Within the DR molecule the beta chain contains all the polymorphisms specifying the peptide binding specificities. Hundreds of DRB1 alleles have been described and typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. DRB1 is expressed at a level five times higher than its paralogs DRB3, DRB4 and DRB5. DRB1 is present in all individuals. Allelic variants of DRB1 are linked with either none or one of the genes DRB3, DRB4 and DRB5. There are 4 related pseudogenes: DRB2, DRB6, DRB7, DRB8 and DRB9. HLA-DRB1
ENSG00000196126 105369230 HLA class II histocompatibility antigen, DRB1-7 beta chain NA LOC105369230
ENSG00000074800 2023 enolase 1 This gene encodes alpha-enolase, one of three enolase isoenzymes found in mammals. Each isoenzyme is a homodimer composed of 2 alpha, 2 gamma, or 2 beta subunits, and functions as a glycolytic enzyme. Alpha-enolase in addition, functions as a structural lens protein (tau-crystallin) in the monomeric form. Alternative splicing of this gene results in a shorter isoform that has been shown to bind to the c-myc promoter and function as a tumor suppressor. Several pseudogenes have been identified, including one on the long arm of chromosome 1. Alpha-enolase has also been identified as an autoantigen in Hashimoto encephalopathy. ENO1
ENSG00000162511 7805 lysosomal protein transmembrane 5 This gene encodes a transmembrane receptor that is associated with lysosomes. The encoded protein, also known as E3 protein, may play a role in hematopoiesis. LAPTM5
ENSG00000266844 ENSG00000266844 NA NA RP11-862L9.3
ENSG00000156508 1915 eukaryotic translation elongation factor 1 alpha 1 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas, and the other isoform (alpha 2) is expressed in brain, heart and skeletal muscle. This isoform is identified as an autoantigen in 66% of patients with Felty syndrome. This gene has been found to have multiple copies on many chromosomes, some of which, if not all, represent different pseudogenes. EEF1A1
ENSG00000231500 6222 ribosomal protein S18 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S13P family of ribosomal proteins. It is located in the cytoplasm. The gene product of the E. coli ortholog (ribosomal protein S13) is involved in the binding of fMet-tRNA, and thus, in the initiation of translation. This gene is an ortholog of mouse Ke3. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. RPS18
ENSG00000120885 1191 clusterin The protein encoded by this gene is a secreted chaperone that can under some stress conditions also be found in the cell cytosol. It has been suggested to be involved in several basic biological events such as cell death, tumor progression, and neurodegenerative disorders. Alternate splicing results in both coding and non-coding variants. CLU
ENSG00000206503 3105 major histocompatibility complex, class I, A HLA-A belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon 1 encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Hundreds of HLA-A alleles have been described. HLA-A
ENSG00000180353 3059 hematopoietic cell-specific Lyn substrate 1 NA HCLS1
ENSG00000101335 10398 myosin light chain 9 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. MYL9
ENSG00000059804 6515 solute carrier family 2 member 3 NA SLC2A3
ENSG00000185303 729238 surfactant protein A2 This gene is one of several genes encoding pulmonary-surfactant associated proteins (SFTPA) located on chromosome 10. Mutations in this gene and a highly similar gene located nearby, which affect the highly conserved carbohydrate recognition domain, are associated with idiopathic pulmonary fibrosis. The current version of the assembly displays only a single centromeric SFTPA gene pair rather than the two gene pairs shown in the previous assembly which were thought to have resulted from a duplication. SFTPA2
ENSG00000143119 963 CD53 molecule The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. This encoded protein is a cell surface glycoprotein that is known to complex with integrins. It contributes to the transduction of CD2-generated signals in T cells and natural killer cells and has been suggested to play a role in growth regulation. Familial deficiency of this gene has been linked to an immunodeficiency associated with recurrent infectious diseases caused by bacteria, fungi and viruses. Alternative splicing results in multiple transcript variants. CD53
ENSG00000115386 5967 regenerating family member 1 alpha This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. REG1A
ENSG00000197321 6840 supervillin This gene encodes a bipartite protein with distinct amino- and carboxy-terminal domains. The amino-terminus contains nuclear localization signals and the carboxy-terminus contains numerous consecutive sequences with extensive similarity to proteins in the gelsolin family of actin-binding proteins, which cap, nucleate, and/or sever actin filaments. The gene product is tightly associated with both actin filaments and plasma membranes, suggesting a role as a high-affinity link between the actin cytoskeleton and the membrane. The encoded protein appears to aid in both myosin II assembly during cell spreading and disassembly of focal adhesions. Several transcript variants encoding different isoforms of supervillin have been described. SVIL
ENSG00000165795 57447 NDRG family member 2 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that may play a role in neurite outgrowth. This gene may be involved in glioblastoma carcinogenesis. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. NDRG2
ENSG00000110719 10312 T-cell immune regulator 1, ATPase H+ transporting V0 subunit a3 Through alternate splicing, this gene encodes two proteins with similarity to subunits of the vacuolar ATPase (V-ATPase) but the encoded proteins seem to have different functions. V-ATPase is a multisubunit enzyme that mediates acidification of eukaryotic intracellular organelles. V-ATPase dependent organelle acidification is necessary for such intracellular processes as protein sorting, zymogen activation, and receptor-mediated endocytosis. V-ATPase is comprised of a cytosolic V1 domain and a transmembrane V0 domain. Mutations in this gene are associated with infantile malignant osteopetrosis. TCIRG1
ENSG00000075624 60 actin, beta This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. ACTB
ENSG00000113140 6678 secreted protein acidic and cysteine rich This gene encodes a cysteine-rich acidic matrix-associated protein. The encoded protein is required for the collagen in bone to become calcified but is also involved in extracellular matrix synthesis and promotion of changes to cell shape. The gene product has been associated with tumor suppression but has also been correlated with metastasis based on changes to cell shape which can promote tumor cell invasion. Three transcript variants encoding different isoforms have been found for this gene. SPARC
ENSG00000023445 330 baculoviral IAP repeat containing 3 This gene encodes a member of the IAP family of proteins that inhibit apoptosis by binding to tumor necrosis factor receptor-associated factors TRAF1 and TRAF2, probably by interfering with activation of ICE-like proteases. The encoded protein inhibits apoptosis induced by serum deprivation but does not affect apoptosis resulting from exposure to menadione, a potent inducer of free radicals. It contains 3 baculovirus IAP repeats and a ring finger domain. Transcript variants encoding the same isoform have been identified. BIRC3
ENSG00000142937 6202 ribosomal protein S8 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S8E family of ribosomal proteins. It is located in the cytoplasm. Increased expression of this gene in colorectal tumors and colon polyps compared to matched normal colonic mucosa has been observed. This gene is co-transcribed with the small nucleolar RNA genes U38A, U38B, U39, and U40, which are located in its fourth, fifth, first, and second introns, respectively. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. RPS8
ENSG00000211896 ENSG00000211896 immunoglobulin heavy constant gamma 1 (G1m marker) NA IGHG1
ENSG00000092054 4625 myosin, heavy chain 7, cardiac muscle, beta Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. MYH7
ENSG00000131095 2670 glial fibrillary acidic protein This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. GFAP
ENSG00000158710 8407 transgelin 2 The protein encoded by this gene is similar to the protein transgelin, which is one of the earliest markers of differentiated smooth muscle. The specific function of this protein has not yet been determined, although it is thought to be a tumor suppressor. Multiple transcript variants encoding different isoforms have been found for this gene. TAGLN2
ENSG00000122852 653509 surfactant protein A1 This gene encodes a lung surfactant protein that is a member of a subfamily of C-type lectins called collectins. The encoded protein binds specific carbohydrate moieties found on lipids and on the surface of microorganisms. This protein plays an essential role in surfactant homeostasis and in the defense against respiratory pathogens. Mutations in this gene are associated with idiopathic pulmonary fibrosis. Alternate splicing results in multiple transcript variants. SFTPA1
ENSG00000137392 1208 colipase The protein encoded by this gene is a cofactor needed by pancreatic lipase for efficient dietary lipid hydrolysis. It binds to the C-terminal, non-catalytic domain of lipase, thereby stabilizing an active conformation and considerably increasing the overall hydrophobic binding site. The gene product allows lipase to anchor noncovalently to the surface of lipid micelles, counteracting the destabilizing influence of intestinal bile salts. This cofactor is only expressed in pancreatic acinar cells, suggesting regulation of expression by tissue-specific elements. Three transcript variants encoding different isoforms have been found for this gene. CLPS
ENSG00000118503 7128 TNF alpha induced protein 3 This gene was identified as a gene whose expression is rapidly induced by the tumor necrosis factor (TNF). The protein encoded by this gene is a zinc finger protein and ubiqitin-editing enzyme, and has been shown to inhibit NF-kappa B activation as well as TNF-mediated apoptosis. The encoded protein, which has both ubiquitin ligase and deubiquitinase activities, is involved in the cytokine-mediated immune and inflammatory responses. Several transcript variants encoding the same protein have been found for this gene. TNFAIP3
ENSG00000103187 23406 coactosin like F-actin binding protein 1 This gene encodes one of the numerous actin-binding proteins which regulate the actin cytoskeleton. This protein binds F-actin, and also interacts with 5-lipoxygenase, which is the first committed enzyme in leukotriene biosynthesis. Although this gene has been reported to map to chromosome 17 in the Smith-Magenis syndrome region, the best alignments for this gene are to chromosome 16. The Smith-Magenis syndrome region is the site of two related pseudogenes. COTL1
ENSG00000143947 6233 ribosomal protein S27a Ubiquitin, a highly conserved protein that has a major role in targeting cellular proteins for degradation by the 26S proteosome, is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin fused to an unrelated protein. This gene encodes a fusion protein consisting of ubiquitin at the N terminus and ribosomal protein S27a at the C terminus. When expressed in yeast, the protein is post-translationally processed, generating free ubiquitin monomer and ribosomal protein S27a. Ribosomal protein S27a is a component of the 40S subunit of the ribosome and belongs to the S27AE family of ribosomal proteins. It contains C4-type zinc finger domains and is located in the cytoplasm. Pseudogenes derived from this gene are present in the genome. As with ribosomal protein S27a, ribosomal protein L40 is also synthesized as a fusion protein with ubiquitin; similarly, ribosomal protein S30 is synthesized as a fusion protein with the ubiquitin-like protein fubi. Multiple alternatively spliced transcript variants that encode the same proteins have been identified. RPS27A
ENSG00000156804 114907 F-box protein 32 This gene encodes a member of the F-box protein family which is characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of the ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into 3 classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbxs class and contains an F-box domain. This protein is highly expressed during muscle atrophy, whereas mice deficient in this gene were found to be resistant to atrophy. This protein is thus a potential drug target for the treatment of muscle atrophy. Alternative splicing results in multiple transcript variants encoding different isoforms. FBXO32
ENSG00000223865 3115 major histocompatibility complex, class II, DP beta 1 HLA-DPB belongs to the HLA class II beta chain paralogues. This class II molecule is a heterodimer consisting of an alpha (DPA) and a beta chain (DPB), both anchored in the membrane. It plays a central role in the immune system by presenting peptides derived from extracellular proteins. Class II molecules are expressed in antigen presenting cells (APC: B lymphocytes, dendritic cells, macrophages). The beta chain is approximately 26-28 kDa and its gene contains 6 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the two extracellular domains, exon 4 encodes the transmembrane domain and exon 5 encodes the cytoplasmic tail. Within the DP molecule both the alpha chain and the beta chain contain the polymorphisms specifying the peptide binding specificities, resulting in up to 4 different molecules. HLA-DPB1
ENSG00000102879 11151 coronin 1A This gene encodes a member of the WD repeat protein family. WD repeats are minimally conserved regions of approximately 40 amino acids typically bracketed by gly-his and trp-asp (GH-WD), which may facilitate formation of heterotrimeric or multiprotein complexes. Members of this family are involved in a variety of cellular processes, including cell cycle progression, signal transduction, apoptosis, and gene regulation. Alternative splicing results in multiple transcript variants. A related pseudogene has been defined on chromosome 16. CORO1A
ENSG00000206172 3039 hemoglobin subunit alpha 1 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. HBA1
ENSG00000173641 27129 heat shock protein family B (small) member 7 NA HSPB7
ENSG00000178104 9659 phosphodiesterase 4D interacting protein The protein encoded by this gene serves to anchor phosphodiesterase 4D to the Golgi/centrosome region of the cell. Defects in this gene may be a cause of myeloproliferative disorder (MBD) associated with eosinophilia. Several transcript variants encoding different isoforms have been found for this gene. PDE4DIP
ENSG00000163131 1520 cathepsin S The protein encoded by this gene, a member of the peptidase C1 family, is a lysosomal cysteine proteinase that may participate in the degradation of antigenic proteins to peptides for presentation on MHC class II molecules. The encoded protein can function as an elastase over a broad pH range in alveolar macrophages. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. CTSS
ENSG00000089157 6175 ribosomal protein lateral stalk subunit P0 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein, which is the functional equivalent of the E. coli L10 ribosomal protein, belongs to the L10P family of ribosomal proteins. It is a neutral phosphoprotein with a C-terminal end that is nearly identical to the C-terminal ends of the acidic ribosomal phosphoproteins P1 and P2. The P0 protein can interact with P1 and P2 to form a pentameric complex consisting of P1 and P2 dimers, and a P0 monomer. The protein is located in the cytoplasm. Transcript variants derived from alternative splicing exist; they encode the same protein. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. RPLP0
ENSG00000180354 222166 maturin, neural progenitor differentiation regulator homolog (Xenopus) NA MTURN
ENSG00000168928 440387 chymotrypsinogen B2 NA CTRB2
ENSG00000090104 5996 regulator of G-protein signaling 1 This gene encodes a member of the regulator of G-protein signalling family. This protein is located on the cytosolic side of the plasma membrane and contains a conserved, 120 amino acid motif called the RGS domain. The protein attenuates the signalling activity of G-proteins by binding to activated, GTP-bound G alpha subunits and acting as a GTPase activating protein (GAP), increasing the rate of conversion of the GTP to GDP. This hydrolysis allows the G alpha subunits to bind G beta/gamma subunit heterodimers, forming inactive G-protein heterotrimers, thereby terminating the signal. RGS1
ENSG00000142676 6135 ribosomal protein L11 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L5P family of ribosomal proteins. It is located in the cytoplasm. The protein probably associates with the 5S rRNA. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. RPL11
ENSG00000168028 3921 ribosomal protein SA Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Many of the effects of laminin are mediated through interactions with cell surface receptors. These receptors include members of the integrin family, as well as non-integrin laminin-binding proteins. This gene encodes a high-affinity, non-integrin family, laminin receptor 1. This receptor has been variously called 67 kD laminin receptor, 37 kD laminin receptor precursor (37LRP) and p40 ribosome-associated protein. The amino acid sequence of laminin receptor 1 is highly conserved through evolution, suggesting a key biological function. It has been observed that the level of the laminin receptor transcript is higher in colon carcinoma tissue and lung cancer cell line than their normal counterparts. Also, there is a correlation between the upregulation of this polypeptide in cancer cells and their invasive and metastatic phenotype. Multiple copies of this gene exist, however, most of them are pseudogenes thought to have arisen from retropositional events. Two alternatively spliced transcript variants encoding the same protein have been found for this gene. RPSA
ENSG00000021300 58473 pleckstrin homology domain containing B1 NA PLEKHB1
ENSG00000211899 ENSG00000211899 immunoglobulin heavy constant mu NA IGHM
ENSG00000219073 23436 chymotrypsin like elastase family member 3B Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. CELA3B
ENSG00000115415 6772 signal transducer and activator of transcription 1 The protein encoded by this gene is a member of the STAT protein family. In response to cytokines and growth factors, STAT family members are phosphorylated by the receptor associated kinases, and then form homo- or heterodimers that translocate to the cell nucleus where they act as transcription activators. This protein can be activated by various ligands including interferon-alpha, interferon-gamma, EGF, PDGF and IL6. This protein mediates the expression of a variety of genes, which is thought to be important for cell viability in response to different cell stimuli and pathogens. Two alternatively spliced transcript variants encoding distinct isoforms have been described. STAT1
ENSG00000109472 1363 carboxypeptidase E This gene encodes a member of the M14 family of metallocarboxypeptidases. The encoded preproprotein is proteolytically processed to generate the mature peptidase. This peripheral membrane protein cleaves C-terminal amino acid residues and is involved in the biosynthesis of peptide hormones and neurotransmitters, including insulin. This protein may also function independently of its peptidase activity, as a neurotrophic factor that promotes neuronal survival, and as a sorting receptor that binds to regulated secretory pathway proteins, including prohormones. Mutations in this gene are implicated in type 2 diabetes. CPE
ENSG00000078804 58476 tumor protein p53 inducible nuclear protein 2 NA TP53INP2
ENSG00000095637 10580 sorbin and SH3 domain containing 1 This gene encodes a CBL-associated protein which functions in the signaling and stimulation of insulin. Mutations in this gene may be associated with human disorders of insulin resistance. Alternative splicing results in multiple transcript variants. SORBS1
ENSG00000168925 1504 chymotrypsinogen B1 The protein encoded by this gene is one of a family of serine proteases that is secreted into the gastrointestinal tract as an inactive precursor, which is activated by proteolytic cleavage with trypsin. CTRB1
ENSG00000160255 3689 integrin subunit beta 2 This gene encodes an integrin beta chain, which combines with multiple different alpha chains to form different integrin heterodimers. Integrins are integral cell-surface proteins that participate in cell adhesion as well as cell-surface mediated signalling. The encoded protein plays an important role in immune response and defects in this gene cause leukocyte adhesion deficiency. Alternative splicing results in multiple transcript variants. ITGB2
ENSG00000090339 3383 intercellular adhesion molecule 1 This gene encodes a cell surface glycoprotein which is typically expressed on endothelial cells and cells of the immune system. It binds to integrins of type CD11a / CD18, or CD11b / CD18 and is also exploited by Rhinovirus as a receptor. ICAM1
ENSG00000157601 4599 MX dynamin like GTPase 1 This gene encodes a guanosine triphosphate (GTP)-metabolizing protein that participates in the cellular antiviral response. The encoded protein is induced by type I and type II interferons and antagonizes the replication process of several different RNA and DNA viruses. There is a related gene located adjacent to this gene on chromosome 21, and there are multiple pseudogenes located in a cluster on chromosome 4. Alternative splicing results in multiple transcript variants. MX1
ENSG00000105372 6223 ribosomal protein S19 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S19E family of ribosomal proteins. It is located in the cytoplasm. Mutations in this gene cause Diamond-Blackfan anemia (DBA), a constitutional erythroblastopenia characterized by absent or decreased erythroid precursors, in a subset of patients. This suggests a possible extra-ribosomal function for this gene in erythropoietic differentiation and proliferation, in addition to its ribosomal function. Higher expression levels of this gene in some primary colon carcinomas compared to matched normal colon tissues has been observed. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. RPS19
ENSG00000104879 1158 creatine kinase, M-type The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. CKM
ENSG00000109846 1410 crystallin alpha B Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. CRYAB
ENSG00000196205 ENSG00000196205 eukaryotic translation elongation factor 1 alpha 1 pseudogene 5 NA EEF1A1P5
ENSG00000168484 6440 surfactant protein C This gene encodes the pulmonary-associated surfactant protein C (SPC), an extremely hydrophobic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 2, also called pulmonary alveolar proteinosis due to surfactant protein C deficiency, and are associated with interstitial lung disease in older infants, children, and adults. Alternatively spliced transcript variants encoding different protein isoforms have been identified. SFTPC
ENSG00000140853 84166 NLR family CARD domain containing 5 This gene encodes a member of the caspase recruitment domain-containing NLR family. This gene plays a role in cytokine response and antiviral immunity through its inhibition of NF-kappa-B activation and negative regulation of type I interferon signaling pathways. NLRC5
ENSG00000149273 6188 ribosomal protein S3 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit, where it forms part of the domain where translation is initiated. The protein belongs to the S3P family of ribosomal proteins. Studies of the mouse and rat proteins have demonstrated that the protein has an extraribosomal role as an endonuclease involved in the repair of UV-induced DNA damage. The protein appears to be located in both the cytoplasm and nucleus but not in the nucleolus. Higher levels of expression of this gene in colon adenocarcinomas and adenomatous polyps compared to adjacent normal colonic mucosa have been observed. This gene is co-transcribed with the small nucleolar RNA genes U15A and U15B, which are located in its first and fifth introns, respectively. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. RPS3
ENSG00000111679 5777 protein tyrosine phosphatase, non-receptor type 6 The protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family. PTPs are known to be signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation. N-terminal part of this PTP contains two tandem Src homolog (SH2) domains, which act as protein phospho-tyrosine binding domains, and mediate the interaction of this PTP with its substrates. This PTP is expressed primarily in hematopoietic cells, and functions as an important regulator of multiple signaling pathways in hematopoietic cells. This PTP has been shown to interact with, and dephosphorylate a wide spectrum of phospho-proteins involved in hematopoietic cell signaling. Multiple alternatively spliced variants of this gene, which encode distinct isoforms, have been reported. PTPN6
ENSG00000103811 1512 cathepsin H The protein encoded by this gene is a lysosomal cysteine proteinase important in the overall degradation of lysosomal proteins. It is composed of a dimer of disulfide-linked heavy and light chains, both produced from a single protein precursor. The encoded protein, which belongs to the peptidase C1 protein family, can act both as an aminopeptidase and as an endopeptidase. Increased expression of this gene has been correlated with malignant progression of prostate tumors. Alternate splicing of this gene results in multiple transcript variants encoding different isoforms. CTSH
ENSG00000130294 547 kinesin family member 1A The protein encoded by this gene is a member of the kinesin family and functions as an anterograde motor protein that transports membranous organelles along axonal microtubules. Mutations at this locus have been associated with spastic paraplegia-30 and hereditary sensory neuropathy IIC. Alternatively spliced transcript variants encoding distinct isoforms have been described. KIF1A
ENSG00000198668 801 calmodulin 1 (phosphorylase kinase, delta) This gene encodes a member of the EF-hand calcium-binding protein family. It is one of three genes which encode an identical calcium binding protein which is one of the four subunits of phosphorylase kinase. Two pseudogenes have been identified on chromosome 7 and X. Multiple transcript variants encoding different isoforms have been found for this gene. CALM1
ENSG00000198668 805 calmodulin 2 (phosphorylase kinase, delta) This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. CALM2
ENSG00000162734 8682 phosphoprotein enriched in astrocytes 15 This gene encodes a death effector domain-containing protein that functions as a negative regulator of apoptosis. The encoded protein is an endogenous substrate for protein kinase C. This protein is also overexpressed in type 2 diabetes mellitus, where it may contribute to insulin resistance in glucose uptake. Alternative splicing results in multiple transcript variants. PEA15
ENSG00000130303 684 bone marrow stromal cell antigen 2 Bone marrow stromal cells are involved in the growth and development of B-cells. The specific function of the protein encoded by the bone marrow stromal cell antigen 2 is undetermined; however, this protein may play a role in pre-B-cell growth and in rheumatoid arthritis. BST2
ENSG00000198125 4151 myoglobin This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. MB
ENSG00000152661 2697 gap junction protein alpha 1 This gene is a member of the connexin gene family. The encoded protein is a component of gap junctions, which are composed of arrays of intercellular channels that provide a route for the diffusion of low molecular weight materials from cell to cell. The encoded protein is the major protein of gap junctions in the heart that are thought to have a crucial role in the synchronized contraction of the heart and in embryonic development. A related intronless pseudogene has been mapped to chromosome 5. Mutations in this gene have been associated with oculodentodigital dysplasia, autosomal recessive craniometaphyseal dysplasia and heart malformations. GJA1
ENSG00000166963 4130 microtubule associated protein 1A This gene encodes a protein that belongs to the microtubule-associated protein family. The proteins of this family are thought to be involved in microtubule assembly, which is an essential step in neurogenesis. The product of this gene is a precursor polypeptide that presumably undergoes proteolytic processing to generate the final MAP1A heavy chain and LC2 light chain. Expression of this gene is almost exclusively in the brain. Studies of the rat microtubule-associated protein 1A gene suggested a role in early events of spinal cord development. MAP1A
ENSG00000187514 5757 prothymosin, alpha NA PTMA
ENSG00000187514 728026 prothymosin alpha-like NA LOC728026
ENSG00000172037 3913 laminin subunit beta 2 Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins, composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively), form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. Several isoforms of each chain have been described. Different alpha, beta and gamma chain isomers combine to give rise to different heterotrimeric laminin isoforms which are designated by Arabic numerals in the order of their discovery, i.e. alpha1beta1gamma1 heterotrimer is laminin 1. The biological functions of the different chains and trimer molecules are largely unknown, but some of the chains have been shown to differ with respect to their tissue distribution, presumably reflecting diverse functions in vivo. This gene encodes the beta chain isoform laminin, beta 2. The beta 2 chain contains the 7 structural domains typical of beta chains of laminin, including the short alpha region. However, unlike beta 1 chain, beta 2 has a more restricted tissue distribution. It is enriched in the basement membrane of muscles at the neuromuscular junctions, kidney glomerulus and vascular smooth muscle. Transgenic mice in which the beta 2 chain gene was inactivated by homologous recombination, showed defects in the maturation of neuromuscular junctions and impairment of glomerular filtration. Alternative splicing involving a non consensus 5’ splice site (gc) in the 5’ UTR of this gene has been reported. It was suggested that inefficient splicing of this first intron, which does not change the protein sequence, results in a greater abundance of the unspliced form of the transcript than the spliced form. The full-length nature of the spliced transcript is not known. LAMB2
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",9,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 10 Annotations

out <- mygene::queryMany(gene_list[10,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol query summary name X_id notfound
TG ENSG00000042832 Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. thyroglobulin 7038 NA
TPO ENSG00000115705 This gene encodes a membrane-bound glycoprotein. The encoded protein acts as an enzyme and plays a central role in thyroid gland function. The protein functions in the iodination of tyrosine residues in thyroglobulin and phenoxy-ester formation between pairs of iodinated tyrosines to generate the thyroid hormones, thyroxine and triiodothyronine. Mutations in this gene are associated with several disorders of thyroid hormonogenesis, including congenital hypothyroidism, congenital goiter, and thyroid hormone organification defect IIA. Multiple transcript variants encoding distinct isoforms have been identified for this gene, but the full-length nature of some variants has not been determined. thyroid peroxidase 7173 NA
DES ENSG00000175084 This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. desmin 1674 NA
PAX8 ENSG00000125618 This gene encodes a member of the paired box (PAX) family of transcription factors. Members of this gene family typically encode proteins that contain a paired box domain, an octapeptide, and a paired-type homeodomain. This nuclear protein is involved in thyroid follicular cell development and expression of thyroid-specific genes. Mutations in this gene have been associated with thyroid dysgenesis, thyroid follicular carcinomas and atypical follicular thyroid adenomas. Alternatively spliced transcript variants encoding different isoforms have been described. paired box 8 7849 NA
CLU ENSG00000120885 The protein encoded by this gene is a secreted chaperone that can under some stress conditions also be found in the cell cytosol. It has been suggested to be involved in several basic biological events such as cell death, tumor progression, and neurodegenerative disorders. Alternate splicing results in both coding and non-coding variants. clusterin 1191 NA
NA ENSG00000090920 NA NA NA TRUE
RAP1GAP ENSG00000076864 This gene encodes a type of GTPase-activating-protein (GAP) that down-regulates the activity of the ras-related RAP1 protein. RAP1 acts as a molecular switch by cycling between an inactive GDP-bound form and an active GTP-bound form. The product of this gene, RAP1GAP, promotes the hydrolysis of bound GTP and hence returns RAP1 to the inactive state whereas other proteins, guanine nucleotide exchange factors (GEFs), act as RAP1 activators by facilitating the conversion of RAP1 from the GDP- to the GTP-bound form. In general, ras subfamily proteins, such as RAP1, play key roles in receptor-linked signaling pathways that control cell growth and differentiation. RAP1 plays a role in diverse processes such as cell proliferation, adhesion, differentiation, and embryogenesis. Alternative splicing results in multiple transcript variants encoding distinct proteins. RAP1 GTPase activating protein 5909 NA
MYH11 ENSG00000133392 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. myosin, heavy chain 11, smooth muscle 4629 NA
FN1 ENSG00000115414 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. fibronectin 1 2335 NA
AHNAK ENSG00000124942 NA AHNAK nucleoprotein 79026 NA
NEAT1 ENSG00000245532 This gene produces a long non-coding RNA (lncRNA) transcribed from the multiple endocrine neoplasia locus. This lncRNA is retained in the nucleus where it forms the core structural component of the paraspeckle sub-organelles. It may act as a transcriptional regulator for numerous genes, including some genes involved in cancer progression. nuclear paraspeckle assembly transcript 1 (non-protein coding) 283131 NA
TPT1 ENSG00000133112 NA tumor protein, translationally-controlled 1 7178 NA
GAPDH ENSG00000111640 This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. glyceraldehyde-3-phosphate dehydrogenase 2597 NA
HSP90B1 ENSG00000166598 This gene encodes a member of a family of adenosine triphosphate(ATP)-metabolizing molecular chaperones with roles in stabilizing and folding other proteins. The encoded protein is localized to melanosomes and the endoplasmic reticulum. Expression of this protein is associated with a variety of pathogenic states, including tumor formation. There is a microRNA gene located within the 5’ exon of this gene. There are pseudogenes for this gene on chromosomes 1 and 15. heat shock protein 90kDa beta family member 1 7184 NA
ALDOA ENSG00000149925 The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10. aldolase, fructose-bisphosphate A 226 NA
LIPG ENSG00000101670 The protein encoded by this gene has substantial phospholipase activity and may be involved in lipoprotein metabolism and vascular biology. This protein is designated a member of the TG lipase family by its sequence and characteristic lid region which provides substrate specificity for enzymes of the TG lipase family. lipase G, endothelial type 9388 NA
GNAS ENSG00000087460 This locus has a highly complex imprinted expression pattern. It gives rise to maternally, paternally, and biallelically expressed transcripts that are derived from four alternative promoters and 5’ exons. Some transcripts contain a differentially methylated region (DMR) at their 5’ exons, and this DMR is commonly found in imprinted genes and correlates with transcript expression. An antisense transcript is produced from an overlapping locus on the opposite strand. One of the transcripts produced from this locus, and the antisense transcript, are paternally expressed noncoding RNAs, and may regulate imprinting in this region. In addition, one of the transcripts contains a second overlapping ORF, which encodes a structurally unrelated protein - Alex. Alternative splicing of downstream exons is also observed, which results in different forms of the stimulatory G-protein alpha subunit, a key element of the classical signal transduction pathway linking receptor-ligand interactions with the activation of adenylyl cyclase and a variety of cellular reponses. Multiple transcript variants encoding different isoforms have been found for this gene. Mutations in this gene result in pseudohypoparathyroidism type 1a, pseudohypoparathyroidism type 1b, Albright hereditary osteodystrophy, pseudopseudohypoparathyroidism, McCune-Albright syndrome, progressive osseus heteroplasia, polyostotic fibrous dysplasia of bone, and some pituitary tumors. GNAS complex locus 2778 NA
CTSB ENSG00000164733 This gene encodes a member of the C1 family of peptidases. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate multiple protein products. These products include the cathepsin B light and heavy chains, which can dimerize to form the double chain form of the enzyme. This enzyme is a lysosomal cysteine protease with both endopeptidase and exopeptidase activity that may play a role in protein turnover. It is also known as amyloid precursor protein secretase and is involved in the proteolytic processing of amyloid precursor protein (APP). Incomplete proteolytic processing of APP has been suggested to be a causative factor in Alzheimer’s disease, the most common cause of dementia. Overexpression of the encoded protein has been associated with esophageal adenocarcinoma and other tumors. Multiple pseudogenes of this gene have been identified. cathepsin B 1508 NA
SORD ENSG00000140263 Sorbitol dehydrogenase (SORD; EC 1.1.1.14) catalyzes the interconversion of polyols and their corresponding ketoses, and together with aldose reductase (ALDR1; MIM 103880), makes up the sorbitol pathway that is believed to play an important role in the development of diabetic complications (summarized by Carr and Markham, 1995 [PubMed 8535074]). The first reaction of the pathway (also called the polyol pathway) is the reduction of glucose to sorbitol by ALDR1 with NADPH as the cofactor. SORD then oxidizes the sorbitol to fructose using NAD(+) cofactor. sorbitol dehydrogenase 6652 NA
ANXA1 ENSG00000135046 This gene encodes a membrane-localized protein that binds phospholipids. This protein inhibits phospholipase A2 and has anti-inflammatory activity. Loss of function or expression of this gene has been detected in multiple tumors. annexin A1 301 NA
APP ENSG00000142192 This gene encodes a cell surface receptor and transmembrane precursor protein that is cleaved by secretases to form a number of peptides. Some of these peptides are secreted and can bind to the acetyltransferase complex APBB1/TIP60 to promote transcriptional activation, while others form the protein basis of the amyloid plaques found in the brains of patients with Alzheimer disease. In addition, two of the peptides are antimicrobial peptides, having been shown to have bacteriocidal and antifungal activities. Mutations in this gene have been implicated in autosomal dominant Alzheimer disease and cerebroarterial amyloidosis (cerebral amyloid angiopathy). Multiple transcript variants encoding several different isoforms have been found for this gene. amyloid beta precursor protein 351 NA
CALR ENSG00000179218 Calreticulin is a multifunctional protein that acts as a major Ca(2+)-binding (storage) protein in the lumen of the endoplasmic reticulum. It is also found in the nucleus, suggesting that it may have a role in transcription regulation. Calreticulin binds to the synthetic peptide KLGFFKR, which is almost identical to an amino acid sequence in the DNA-binding domain of the superfamily of nuclear receptors. Calreticulin binds to antibodies in certain sera of systemic lupus and Sjogren patients which contain anti-Ro/SSA antibodies, it is highly conserved among species, and it is located in the endoplasmic and sarcoplasmic reticulum where it may bind calcium. The amino terminus of calreticulin interacts with the DNA-binding domain of the glucocorticoid receptor and prevents the receptor from binding to its specific glucocorticoid response element. Calreticulin can inhibit the binding of androgen receptor to its hormone-responsive DNA element and can inhibit androgen receptor and retinoic acid receptor transcriptional activities in vivo, as well as retinoic acid-induced neuronal differentiation. Thus, calreticulin can act as an important modulator of the regulation of gene transcription by nuclear hormone receptors. Systemic lupus erythematosus is associated with increased autoantibody titers against calreticulin but calreticulin is not a Ro/SS-A antigen. Earlier papers referred to calreticulin as an Ro/SS-A antigen but this was later disproven. Increased autoantibody titer against human calreticulin is found in infants with complete congenital heart block of both the IgG and IgM classes. calreticulin 811 NA
EPCAM ENSG00000119888 This gene encodes a carcinoma-associated antigen and is a member of a family that includes at least two type I membrane proteins. This antigen is expressed on most normal epithelial cells and gastrointestinal carcinomas and functions as a homotypic calcium-independent cell adhesion molecule. The antigen is being used as a target for immunotherapy treatment of human carcinomas. Mutations in this gene result in congenital tufting enteropathy. epithelial cell adhesion molecule 4072 NA
PLEKHH1 ENSG00000054690 NA pleckstrin homology, MyTH4 and FERM domain containing H1 57475 NA
TFF3 ENSG00000160180 Members of the trefoil family are characterized by having at least one copy of the trefoil motif, a 40-amino acid domain that contains three conserved disulfides. They are stable secretory proteins expressed in gastrointestinal mucosa. Their functions are not defined, but they may protect the mucosa from insults, stabilize the mucus layer and affect healing of the epithelium. This gene is expressed in goblet cells of the intestines and colon. This gene and two other related trefoil family member genes are found in a cluster on chromosome 21. trefoil factor 3 7033 NA
TPM3 ENSG00000143549 This gene encodes a member of the tropomyosin family of actin-binding proteins. Tropomyosins are dimers of coiled-coil proteins that provide stability to actin filaments and regulate access of other actin-binding proteins. Mutations in this gene result in autosomal dominant nemaline myopathy and other muscle disorders. This locus is involved in translocations with other loci, including anaplastic lymphoma receptor tyrosine kinase (ALK) and neurotrophic tyrosine kinase receptor type 1 (NTRK1), which result in the formation of fusion proteins that act as oncogenes. There are numerous pseudogenes for this gene on different chromosomes. Alternative splicing results in multiple transcript variants. tropomyosin 3 7170 NA
RGL3 ENSG00000205517 NA ral guanine nucleotide dissociation stimulator like 3 57139 NA
TAGLN ENSG00000149591 The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. transgelin 6876 NA
NPNT ENSG00000168743 NA nephronectin 255743 NA
ADGRG1 ENSG00000205336 This gene encodes a member of the G protein-coupled receptor family and regulates brain cortical patterning. The encoded protein binds specifically to transglutaminase 2, a component of tissue and tumor stroma implicated as an inhibitor of tumor progression. Mutations in this gene are associated with a brain malformation known as bilateral frontoparietal polymicrogyria. Alternative splicing results in multiple transcript variants. adhesion G protein-coupled receptor G1 9289 NA
GOLGA8B ENSG00000215252 NA golgin A8 family member B 440270 NA
GOLGA8A ENSG00000215252 The Golgi apparatus, which participates in glycosylation and transport of proteins and lipids in the secretory pathway, consists of a series of stacked, flattened membrane sacs referred to as cisternae. Interactions between the Golgi and microtubules are thought to be important for the reorganization of the Golgi after it fragments during mitosis. The golgins constitute a family of proteins which are localized to the Golgi. This gene encodes a golgin which structurally resembles its family member GOLGA2, suggesting that they may share a similar function. There are many similar copies of this gene on chromosome 15. Alternative splicing results in multiple transcript variants. golgin A8 family member A 23015 NA
IVD ENSG00000128928 Isovaleryl-CoA dehydrogenase (IVD) is a mitochondrial matrix enzyme that catalyzes the third step in leucine catabolism. The genetic deficiency of IVD results in an accumulation of isovaleric acid, which is toxic to the central nervous system and leads to isovaleric acidemia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. isovaleryl-CoA dehydrogenase 3712 NA
MTCH1 ENSG00000137409 This gene encodes a member of the mitochondrial carrier family. The encoded protein is localized to the mitochondrion inner membrane and induces apoptosis independent of the proapoptotic proteins Bax and Bak. Pseudogenes on chromosomes 6 and 11 have been identified for this gene. Alternatively spliced transcript variants encoding multiple isoforms have been observed. mitochondrial carrier 1 23787 NA
APOE ENSG00000130203 The protein encoded by this gene is a major apoprotein of the chylomicron. It binds to a specific liver and peripheral cell receptor, and is essential for the normal catabolism of triglyceride-rich lipoprotein constituents. This gene maps to chromosome 19 in a cluster with the related apolipoprotein C1 and C2 genes. Mutations in this gene result in familial dysbetalipoproteinemia, or type III hyperlipoproteinemia (HLP III), in which increased plasma cholesterol and triglycerides are the consequence of impaired clearance of chylomicron and VLDL remnants. Alternative splicing results in multiple transcript variants. apolipoprotein E 348 NA
PEBP1 ENSG00000089220 This gene encodes a member of the phosphatidylethanolamine-binding family of proteins and has been shown to modulate multiple signaling pathways, including the MAP kinase (MAPK), NF-kappa B, and glycogen synthase kinase-3 (GSK-3) signaling pathways. The encoded protein can be further processed to form a smaller cleavage product, hippocampal cholinergic neurostimulating peptide (HCNP), which may be involved in neural development. This gene has been implicated in numerous human cancers and may act as a metastasis suppressor gene. Multiple pseudogenes of this gene have been identified in the genome. phosphatidylethanolamine binding protein 1 5037 NA
GFAP ENSG00000131095 This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. glial fibrillary acidic protein 2670 NA
SYNPO2 ENSG00000172403 NA synaptopodin 2 171024 NA
INPP5J ENSG00000185133 NA inositol polyphosphate-5-phosphatase J 27124 NA
AGRN ENSG00000188157 This gene encodes one of several proteins that are critical in the development of the neuromuscular junction (NMJ), as identified in mouse knock-out studies. The encoded protein contains several laminin G, Kazal type serine protease inhibitor, and epidermal growth factor domains. Additional post-translational modifications occur to add glycosaminoglycans and disulfide bonds. In one family with congenital myasthenic syndrome affecting limb-girdle muscles, a mutation in this gene was found. Alternative splicing results in multiple transcript variants encoding different isoforms. agrin 375790 NA
MKNK2 ENSG00000099875 This gene encodes a member of the calcium/calmodulin-dependent protein kinases (CAMK) Ser/Thr protein kinase family, which belongs to the protein kinase superfamily. This protein contains conserved DLG (asp-leu-gly) and ENIL (glu-asn-ile-leu) motifs, and an N-terminal polybasic region which binds importin A and the translation factor scaffold protein eukaryotic initiation factor 4G (eIF4G). This protein is one of the downstream kinases activated by mitogen-activated protein (MAP) kinases. It phosphorylates the eukaryotic initiation factor 4E (eIF4E), thus playing important roles in the initiation of mRNA translation, oncogenic transformation and malignant cell proliferation. In addition to eIF4E, this protein also interacts with von Hippel-Lindau tumor suppressor (VHL), ring-box 1 (Rbx1) and Cullin2 (Cul2), which are all components of the CBC(VHL) ubiquitin ligase E3 complex. Multiple alternatively spliced transcript variants have been found, but the full-length nature and biological activity of only two variants are determined. These two variants encode distinct isoforms which differ in activity and regulation, and in subcellular localization. MAP kinase interacting serine/threonine kinase 2 2872 NA
ATP8A1 ENSG00000124406 The P-type adenosinetriphosphatases (P-type ATPases) are a family of proteins which use the free energy of ATP hydrolysis to drive uphill transport of ions across membranes. Several subfamilies of P-type ATPases have been identified. One subfamily catalyzes transport of heavy metal ions. Another subfamily transports non-heavy metal ions (NMHI). The protein encoded by this gene is a member of the third subfamily of P-type ATPases and acts to transport amphipaths, such as phosphatidylserine. Two transcript variants encoding different isoforms have been found for this gene. ATPase phospholipid transporting 8A1 10396 NA
S100A9 ENSG00000163220 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. S100 calcium binding protein A9 6280 NA
MT1F ENSG00000198417 NA metallothionein 1F 4494 NA
KRT8 ENSG00000170421 This gene is a member of the type II keratin family clustered on the long arm of chromosome 12. Type I and type II keratins heteropolymerize to form intermediate-sized filaments in the cytoplasm of epithelial cells. The product of this gene typically dimerizes with keratin 18 to form an intermediate filament in simple single-layered epithelial cells. This protein plays a role in maintaining cellular structural integrity and also functions in signal transduction and cellular differentiation. Mutations in this gene cause cryptogenic cirrhosis. Alternatively spliced transcript variants have been found for this gene. keratin 8 3856 NA
FBXL16 ENSG00000127585 Members of the F-box protein family, such as FBXL16, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). F-box and leucine rich repeat protein 16 146330 NA
AFAP1L2 ENSG00000169129 NA actin filament associated protein 1 like 2 84632 NA
RASSF4 ENSG00000107551 The function of this gene has not yet been determined but may involve a role in tumor suppression. Alternative splicing of this gene results in several transcript variants; however, most of the variants have not been fully described. Ras association domain family member 4 83937 NA
SELENBP1 ENSG00000143416 This gene encodes a member of the selenium-binding protein family. Selenium is an essential nutrient that exhibits potent anticarcinogenic properties, and deficiency of selenium may cause certain neurologic diseases. The effects of selenium in preventing cancer and neurologic diseases may be mediated by selenium-binding proteins, and decreased expression of this gene may be associated with several types of cancer. The encoded protein may play a selenium-dependent role in ubiquitination/deubiquitination-mediated protein degradation. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. selenium binding protein 1 8991 NA
TPM1 ENSG00000140416 This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. tropomyosin 1 (alpha) 7168 NA
MT1G ENSG00000125144 NA metallothionein 1G 4495 NA
LOC100129518 ENSG00000112096 NA uncharacterized LOC100129518 100129518 NA
SOD2 ENSG00000112096 This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 1. superoxide dismutase 2, mitochondrial 6648 NA
HSPB6 ENSG00000004776 This locus encodes a heat shock protein. The encoded protein likely plays a role in smooth muscle relaxation. heat shock protein family B (small) member 6 126393 NA
KIF5A ENSG00000155980 This gene encodes a member of the kinesin family of proteins. Members of this family are part of a multisubunit complex that functions as a microtubule motor in intracellular organelle transport. Mutations in this gene cause autosomal dominant spastic paraplegia 10. kinesin family member 5A 3798 NA
H19 ENSG00000130600 This gene is located in an imprinted region of chromosome 11 near the insulin-like growth factor 2 (IGF2) gene. This gene is only expressed from the maternally-inherited chromosome, whereas IGF2 is only expressed from the paternally-inherited chromosome. The product of this gene is a long non-coding RNA which functions as a tumor suppressor. Mutations in this gene have been associated with Beckwith-Wiedemann Syndrome and Wilms tumorigenesis. Alternative splicing results in multiple transcript variants. H19, imprinted maternally expressed transcript (non-protein coding) 283120 NA
EPOR ENSG00000187266 This gene encodes the erythropoietin receptor which is a member of the cytokine receptor family. Upon erythropoietin binding, this receptor activates Jak2 tyrosine kinase which activates different intracellular pathways including: Ras/MAP kinase, phosphatidylinositol 3-kinase and STAT transcription factors. The stimulated erythropoietin receptor appears to have a role in erythroid cell survival. Defects in the erythropoietin receptor may produce erythroleukemia and familial erythrocytosis. Dysregulation of this gene may affect the growth of certain tumors. Alternate splicing results in multiple transcript variants. erythropoietin receptor 2057 NA
SLC4A11 ENSG00000088836 This gene encodes a voltage-regulated, electrogenic sodium-coupled borate cotransporter that is essential for borate homeostasis, cell growth and cell proliferation. Mutations in this gene have been associated with a number of endothelial corneal dystrophies including recessive corneal endothelial dystrophy 2, corneal dystrophy and perceptive deafness, and Fuchs endothelial corneal dystrophy. Multiple transcript variants encoding different isoforms have been described. solute carrier family 4 member 11 83959 NA
RHPN1 ENSG00000158106 NA rhophilin, Rho GTPase binding protein 1 114822 NA
MARCKSL1 ENSG00000175130 This gene encodes a member of the myristoylated alanine-rich C-kinase substrate (MARCKS) family. Members of this family play a role in cytoskeletal regulation, protein kinase C signaling and calmodulin signaling. The encoded protein affects the formation of adherens junction. Alternative splicing results in multiple transcript variants. Pseudogenes of this gene are located on the long arm of chromosomes 6 and 10. MARCKS like 1 65108 NA
COL23A1 ENSG00000050767 COL23A1 is a member of the transmembrane collagens, a subfamily of the nonfibrillar collagens that contain a single pass hydrophobic transmembrane domain (Banyard et al., 2003 [PubMed 12644459]). collagen type XXIII alpha 1 chain 91522 NA
PLVAP ENSG00000130300 NA plasmalemma vesicle associated protein 83483 NA
ITM2C ENSG00000135916 NA integral membrane protein 2C 81618 NA
FAM129A ENSG00000135842 NA family with sequence similarity 129 member A 116496 NA
COL1A1 ENSG00000108821 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. collagen type I alpha 1 1277 NA
FOSL2 ENSG00000075426 The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. FOS like 2, AP-1 transcription factor subunit 2355 NA
PARM1 ENSG00000169116 NA prostate androgen-regulated mucin-like protein 1 25849 NA
CCL21 ENSG00000137077 This antimicrobial gene is one of several CC cytokine genes clustered on the p-arm of chromosome 9. Cytokines are a family of secreted proteins involved in immunoregulatory and inflammatory processes. The CC cytokines are proteins characterized by two adjacent cysteines. Similar to other chemokines the protein encoded by this gene inhibits hemopoiesis and stimulates chemotaxis. This protein is chemotactic in vitro for thymocytes and activated T cells, but not for B cells, macrophages, or neutrophils. The cytokine encoded by this gene may also play a role in mediating homing of lymphocytes to secondary lymphoid organs. It is a high affinity functional ligand for chemokine receptor 7 that is expressed on T and B lymphocytes and a known receptor for another member of the cytokine family (small inducible cytokine A19). C-C motif chemokine ligand 21 6366 NA
MYL9 ENSG00000101335 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. myosin light chain 9 10398 NA
ITPR3 ENSG00000096433 This gene encodes a receptor for inositol 1,4,5-trisphosphate, a second messenger that mediates the release of intracellular calcium. The receptor contains a calcium channel at the C-terminus and the ligand-binding site at the N-terminus. Knockout studies in mice suggest that type 2 and type 3 inositol 1,4,5-trisphosphate receptors play a key role in exocrine secretion underlying energy metabolism and growth. inositol 1,4,5-trisphosphate receptor type 3 3710 NA
SDC2 ENSG00000169439 The protein encoded by this gene is a transmembrane (type I) heparan sulfate proteoglycan and is a member of the syndecan proteoglycan family. The syndecans mediate cell binding, cell signaling, and cytoskeletal organization and syndecan receptors are required for internalization of the HIV-1 tat protein. The syndecan-2 protein functions as an integral membrane protein and participates in cell proliferation, cell migration and cell-matrix interactions via its receptor for extracellular matrix proteins. Altered syndecan-2 expression has been detected in several different tumor types. syndecan 2 6383 NA
STMN3 ENSG00000197457 This gene encodes a protein which is a member of the stathmin protein family. Members of this protein family form a complex with tubulins at a ratio of 2 tubulins for each stathmin protein. Microtubules require the ordered assembly of alpha- and beta-tubulins, and formation of a complex with stathmin disrupts microtubule formation and function. A pseudogene of this gene is located on chromosome 22. Alternative splicing results in multiple transcript variants. stathmin 3 50861 NA
PDK4 ENSG00000004799 This gene is a member of the PDK/BCKDK protein kinase family and encodes a mitochondrial protein with a histidine kinase domain. This protein is located in the matrix of the mitrochondria and inhibits the pyruvate dehydrogenase complex by phosphorylating one of its subunits, thereby contributing to the regulation of glucose metabolism. Expression of this gene is regulated by glucocorticoids, retinoic acid and insulin. pyruvate dehydrogenase kinase 4 5166 NA
B2M ENSG00000166710 This gene encodes a serum protein found in association with the major histocompatibility complex (MHC) class I heavy chain on the surface of nearly all nucleated cells. The protein has a predominantly beta-pleated sheet structure that can form amyloid fibrils in some pathological conditions. The encoded antimicrobial protein displays antibacterial activity in amniotic fluid. A mutation in this gene has been shown to result in hypercatabolic hypoproteinemia. beta-2-microglobulin 567 NA
MCAM ENSG00000076706 NA melanoma cell adhesion molecule 4162 NA
COL3A1 ENSG00000168542 This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. collagen type III alpha 1 chain 1281 NA
FAM107A ENSG00000168309 NA family with sequence similarity 107 member A 11170 NA
TLN1 ENSG00000137076 This gene encodes a cytoskeletal protein that is concentrated in areas of cell-substratum and cell-cell contacts. The encoded protein plays a significant role in the assembly of actin filaments and in spreading and migration of various cell types, including fibroblasts and osteoclasts. It codistributes with integrins in the cell surface membrane in order to assist in the attachment of adherent cells to extracellular matrices and of lymphocytes to other cells. The N-terminus of this protein contains elements for localization to cell-extracellular matrix junctions. The C-terminus contains binding sites for proteins such as beta-1-integrin, actin, and vinculin. talin 1 7094 NA
VEGFA ENSG00000112715 This gene is a member of the PDGF/VEGF growth factor family. It encodes a heparin-binding protein, which exists as a disulfide-linked homodimer. This growth factor induces proliferation and migration of vascular endothelial cells, and is essential for both physiological and pathological angiogenesis. Disruption of this gene in mice resulted in abnormal embryonic blood vessel formation. This gene is upregulated in many known tumors and its expression is correlated with tumor stage and progression. Elevated levels of this protein are found in patients with POEMS syndrome, also known as Crow-Fukase syndrome. Allelic variants of this gene have been associated with microvascular complications of diabetes 1 (MVCD1) and atherosclerosis. Alternatively spliced transcript variants encoding different isoforms have been described. There is also evidence for alternative translation initiation from upstream non-AUG (CUG) codons resulting in additional isoforms. A recent study showed that a C-terminally extended isoform is produced by use of an alternative in-frame translation termination codon via a stop codon readthrough mechanism, and that this isoform is antiangiogenic. Expression of some isoforms derived from the AUG start codon is regulated by a small upstream open reading frame, which is located within an internal ribosome entry site. vascular endothelial growth factor A 7422 NA
ID4 ENSG00000172201 This gene encodes a member of the inhibitor of DNA binding (ID) protein family. These proteins are basic helix-loop-helix transcription factors which can act as tumor suppressors but lack DNA binding activity. Consequently, the activity of the encoded protein depends on the protein binding partner. inhibitor of DNA binding 4, HLH protein 3400 NA
PCP4 ENSG00000183036 NA Purkinje cell protein 4 5121 NA
ARAP2 ENSG00000047365 The protein encoded by this gene contains ARF-GAP, RHO-GAP, ankyrin repeat, RAS-associating, and pleckstrin homology domains. The protein is a phosphatidylinositol (3,4,5)-trisphosphate-dependent Arf6 GAP that binds RhoA-GTP, but it lacks the predicted catalytic arginine in the RHO-GAP domain and does not have RHO-GAP activity. The protein associates with focal adhesions and functions downstream of RhoA to regulate focal adhesion dynamics. ArfGAP with RhoGAP domain, ankyrin repeat and PH domain 2 116984 NA
ECM1 ENSG00000143369 This gene encodes a soluble protein that is involved in endochondral bone formation, angiogenesis, and tumor biology. It also interacts with a variety of extracellular and structural proteins, contributing to the maintenance of skin integrity and homeostasis. Mutations in this gene are associated with lipoid proteinosis disorder (also known as hyalinosis cutis et mucosae or Urbach-Wiethe disease) that is characterized by generalized thickening of skin, mucosae and certain viscera. Alternatively spliced transcript variants encoding distinct isoforms have been described for this gene. extracellular matrix protein 1 1893 NA
COL1A2 ENSG00000164692 This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. collagen type I alpha 2 chain 1278 NA
RP11-290D2.6 ENSG00000273149 NA NA ENSG00000273149 NA
SNX1 ENSG00000028528 This gene encodes a member of the sorting nexin family. Members of this family contain a phox (PX) domain, which is a phosphoinositide binding domain, and are involved in intracellular trafficking. This endosomal protein regulates the cell-surface expression of epidermal growth factor receptor. This protein also has a role in sorting protease-activated receptor-1 from early endosomes to lysosomes. This protein may form oligomeric complexes with family members. This gene results in three transcript variants encoding distinct isoforms. sorting nexin 1 6642 NA
CMTM4 ENSG00000183723 This gene belongs to the chemokine-like factor gene superfamily, a novel family that is similar to the chemokine and the transmembrane 4 superfamilies of signaling molecules. This gene is one of several chemokine-like factor genes located in a cluster on chromosome 16. Alternatively spliced transcript variants encoding different isoforms have been identified. CKLF like MARVEL transmembrane domain containing 4 146223 NA
DCN ENSG00000011465 This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. decorin 1634 NA
COL4A2 ENSG00000134871 This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. The C-terminal portion of the protein, known as canstatin, is an inhibitor of angiogenesis and tumor growth. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. collagen type IV alpha 2 1284 NA
ACTA2 ENSG00000107796 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. actin, alpha 2, smooth muscle, aorta 59 NA
MYH7 ENSG00000092054 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. myosin, heavy chain 7, cardiac muscle, beta 4625 NA
MBP ENSG00000197971 The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. myelin basic protein 4155 NA
ACTG2 ENSG00000163017 Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. actin, gamma 2, smooth muscle, enteric 72 NA
SMTN ENSG00000183963 This gene encodes a structural protein that is found exclusively in contractile smooth muscle cells. It associates with stress fibers and constitutes part of the cytoskeleton. This gene is localized to chromosome 22q12.3, distal to the TUPLE1 locus and outside the DiGeorge syndrome deletion. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. smoothelin 6525 NA
KIAA1522 ENSG00000162522 NA KIAA1522 57648 NA
WFDC2 ENSG00000101443 This gene encodes a protein that is a member of the WFDC domain family. The WFDC domain, or WAP Signature motif, contains eight cysteines forming four disulfide bonds at the core of the protein, and functions as a protease inhibitor in many family members. This gene is expressed in pulmonary epithelial cells, and was also found to be expressed in some ovarian cancers. The encoded protein is a small secretory protein, which may be involved in sperm maturation. WAP four-disulfide core domain 2 10406 NA
SORBS1 ENSG00000095637 This gene encodes a CBL-associated protein which functions in the signaling and stimulation of insulin. Mutations in this gene may be associated with human disorders of insulin resistance. Alternative splicing results in multiple transcript variants. sorbin and SH3 domain containing 1 10580 NA
PPL ENSG00000118898 The protein encoded by this gene is a component of desmosomes and of the epidermal cornified envelope in keratinocytes. The N-terminal domain of this protein interacts with the plasma membrane and its C-terminus interacts with intermediate filaments. Through its rod domain, this protein forms complexes with envoplakin. This protein may serve as a link between the cornified envelope and desmosomes as well as intermediate filaments. AKT1/PKB, a protein kinase mediating a variety of cell growth and survival signaling processes, is reported to interact with this protein, suggesting a possible role for this protein as a localization signal in AKT1-mediated signaling. periplakin 5493 NA
NACA ENSG00000196531 This gene encodes a protein that associates with basic transcription factor 3 (BTF3) to form the nascent polypeptide-associated complex (NAC). This complex binds to nascent proteins that lack a signal peptide motif as they emerge from the ribosome, blocking interaction with the signal recognition particle (SRP) and preventing mistranslocation to the endoplasmic reticulum. This protein is an IgE autoantigen in atopic dermatitis patients. Alternative splicing results in multiple transcript variants, but the full length nature of some of these variants, including those encoding very large proteins, has not been determined. There are multiple pseudogenes of this gene on different chromosomes. nascent polypeptide-associated complex alpha subunit 4666 NA
FARP1 ENSG00000152767 This gene encodes a protein containing a FERM (4.2, exrin, radixin, moesin) domain, a Dbl homology domain, and two pleckstrin homology domains. These domains are found in guanine nucleotide exchange factors and proteins that link the cytoskeleton to the cell membrane. The encoded protein functions in neurons to promote dendritic growth. Alternative splicing results in multiple transcript variants. FERM, ARH/RhoGEF and pleckstrin domain protein 1 10160 NA
RRBP1 ENSG00000125844 This gene encodes a ribosome-binding protein of the endoplasmic reticulum (ER) membrane. Studies suggest that this gene plays a role in ER proliferation, secretory pathways and secretory cell differentiation, and mediation of ER-microtubule interactions. Alternative splicing has been observed and protein isoforms are characterized by regions of N-terminal decapeptide and C-terminal heptad repeats. Splicing of the tandem repeats results in variations in ribosome-binding affinity and secretory function. The full-length nature of variants which differ in repeat length has not been determined. Pseudogenes of this gene have been identified on chromosomes 3 and 7, and RRBP1 has been excluded as a candidate gene in the cause of Alagille syndrome, the result of a mutation in a nearby gene on chromosome 20p12. ribosome binding protein 1 6238 NA
GOLGA8A ENSG00000175265 The Golgi apparatus, which participates in glycosylation and transport of proteins and lipids in the secretory pathway, consists of a series of stacked, flattened membrane sacs referred to as cisternae. Interactions between the Golgi and microtubules are thought to be important for the reorganization of the Golgi after it fragments during mitosis. The golgins constitute a family of proteins which are localized to the Golgi. This gene encodes a golgin which structurally resembles its family member GOLGA2, suggesting that they may share a similar function. There are many similar copies of this gene on chromosome 15. Alternative splicing results in multiple transcript variants. golgin A8 family member A 23015 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",10,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 11 Annotations

out <- mygene::queryMany(gene_list[11,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name X_id summary symbol query notfound
neurogranin 4900 Neurogranin (NRGN) is the human homolog of the neuron-specific rat RC3/neurogranin gene. This gene encodes a postsynaptic protein kinase substrate that binds calmodulin in the absence of calcium. The NRGN gene contains four exons and three introns. The exons 1 and 2 encode the protein and exons 3 and 4 contain untranslated sequences. It is suggested that the NRGN is a direct target for thyroid hormone in human brain, and that control of expression of this gene could underlay many of the consequences of hypothyroidism on mental states during development as well as in adult subjects. NRGN ENSG00000154146 NA
kinesin family member 5A 3798 This gene encodes a member of the kinesin family of proteins. Members of this family are part of a multisubunit complex that functions as a microtubule motor in intracellular organelle transport. Mutations in this gene cause autosomal dominant spastic paraplegia 10. KIF5A ENSG00000155980 NA
keratin 10 3858 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. KRT10 ENSG00000186395 NA
vimentin 7431 This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. VIM ENSG00000026025 NA
keratin 1 3848 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. KRT1 ENSG00000167768 NA
glyceraldehyde-3-phosphate dehydrogenase 2597 This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. GAPDH ENSG00000111640 NA
actin binding LIM protein 1 3983 This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. ABLIM1 ENSG00000099204 NA
polycystin 1, transient receptor potential channel interacting 5310 This gene encodes a member of the polycystin protein family. The encoded glycoprotein contains a large N-terminal extracellular region, multiple transmembrane domains and a cytoplasmic C-tail. It is an integral membrane protein that functions as a regulator of calcium permeable cation channels and intracellular calcium homoeostasis. It is also involved in cell-cell/matrix interactions and may modulate G-protein-coupled signal-transduction pathways. It plays a role in renal tubular development, and mutations in this gene cause autosomal dominant polycystic kidney disease type 1 (ADPKD1). ADPKD1 is characterized by the growth of fluid-filled cysts that replace normal renal tissue and result in end-stage renal failure. Splice variants encoding different isoforms have been noted for this gene. Also, six pseudogenes, closely linked in a known duplicated region on chromosome 16p, have been described. PKD1 ENSG00000008710 NA
glutathione peroxidase 3 2878 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. GPX3 ENSG00000211445 NA
MAM domain containing glycosylphosphatidylinositol anchor 1 266727 NA MDGA1 ENSG00000112139 NA
F-box and leucine rich repeat protein 16 146330 Members of the F-box protein family, such as FBXL16, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). FBXL16 ENSG00000127585 NA
cerebellin 3 precursor 643866 Members of the precerebellin family, such as CBLN3, contain a cerebellin motif (see CBLN1; MIM 600432) and a C-terminal C1q signature domain (see MIM 120550) that mediates trimeric assembly of atypical collagen complexes. However, precerebellins do not contain a collagen motif, suggesting that they are not conventional components of the extracellular matrix (Pang et al., 2000 [PubMed 10964938]). CBLN3 ENSG00000139899 NA
cortexin 1 404217 NA CTXN1 ENSG00000178531 NA
keratin 2 3849 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. KRT2 ENSG00000172867 NA
ectodermal-neural cortex 1 8507 This gene encodes a member of the kelch-related family of actin-binding proteins. The encoded protein plays a role in the oxidative stress response as a regulator of the transcription factor Nrf2, and expression of this gene may play a role in malignant transformation. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENC1 ENSG00000171617 NA
pleckstrin and Sec7 domain containing 5662 This gene encodes a Plekstrin homology and SEC7 domains-containing protein that functions as a guanine nucleotide exchange factor. The encoded protein regulates signal transduction by activating ADP-ribosylation factor 6. Alternative splicing results in multiple transcript variants. PSD ENSG00000059915 NA
NA ENSG00000269968 NA RP5-940J5.9 ENSG00000269968 NA
chromogranin B 1114 This gene encodes a tyrosine-sulfated secretory protein abundant in peptidergic endocrine cells and neurons. This protein may serve as a precursor for regulatory peptides. CHGB ENSG00000089199 NA
TIMP metallopeptidase inhibitor 2 7077 This gene is a member of the TIMP gene family. The proteins encoded by this gene family are natural inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. In addition to an inhibitory role against metalloproteinases, the encoded protein has a unique role among TIMP family members in its ability to directly suppress the proliferation of endothelial cells. As a result, the encoded protein may be critical to the maintenance of tissue homeostasis by suppressing the proliferation of quiescent tissues in response to angiogenic factors, and by inhibiting protease activity in tissues undergoing remodelling of the extracellular matrix. TIMP2 ENSG00000035862 NA
chimerin 1 1123 This gene encodes GTPase-activating protein for ras-related p21-rac and a phorbol ester receptor. It is predominantly expressed in neurons, and plays an important role in neuronal signal-transduction mechanisms. Mutations in this gene are associated with Duane’s retraction syndrome 2 (DURS2). Alternatively spliced transcript variants encoding different isoforms have been described for this gene. CHN1 ENSG00000128656 NA
PERP, TP53 apoptosis effector 64065 NA PERP ENSG00000112378 NA
desmoplakin 1832 This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants. DSP ENSG00000096696 NA
NA NA NA NA ENSG00000163486 TRUE
surfactant protein B 6439 This gene encodes the pulmonary-associated surfactant protein B (SPB), an amphipathic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. The SPB enhances the rate of spreading and increases the stability of surfactant monolayers in vitro. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 1, also called pulmonary alveolar proteinosis due to surfactant protein B deficiency, and are associated with fatal respiratory distress in the neonatal period. Alternatively spliced transcript variants encoding the same protein have been identified. SFTPB ENSG00000168878 NA
actin, beta 60 This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. ACTB ENSG00000075624 NA
T-cell lymphoma invasion and metastasis 1 7074 NA TIAM1 ENSG00000156299 NA
integral membrane protein 2C 81618 NA ITM2C ENSG00000135916 NA
Ca2+ dependent secretion activator 2 93664 This gene encodes a member of the calcium-dependent activator of secretion (CAPS) protein family, which are calcium binding proteins that regulate the exocytosis of synaptic and dense-core vesicles in neurons and neuroendocrine cells. Mutations in this gene may contribute to autism susceptibility. Multiple transcript variants encoding different isoforms have been found for this gene. CADPS2 ENSG00000081803 NA
myosin, heavy chain 6, cardiac muscle, alpha 4624 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. MYH6 ENSG00000197616 NA
GNAS complex locus 2778 This locus has a highly complex imprinted expression pattern. It gives rise to maternally, paternally, and biallelically expressed transcripts that are derived from four alternative promoters and 5’ exons. Some transcripts contain a differentially methylated region (DMR) at their 5’ exons, and this DMR is commonly found in imprinted genes and correlates with transcript expression. An antisense transcript is produced from an overlapping locus on the opposite strand. One of the transcripts produced from this locus, and the antisense transcript, are paternally expressed noncoding RNAs, and may regulate imprinting in this region. In addition, one of the transcripts contains a second overlapping ORF, which encodes a structurally unrelated protein - Alex. Alternative splicing of downstream exons is also observed, which results in different forms of the stimulatory G-protein alpha subunit, a key element of the classical signal transduction pathway linking receptor-ligand interactions with the activation of adenylyl cyclase and a variety of cellular reponses. Multiple transcript variants encoding different isoforms have been found for this gene. Mutations in this gene result in pseudohypoparathyroidism type 1a, pseudohypoparathyroidism type 1b, Albright hereditary osteodystrophy, pseudopseudohypoparathyroidism, McCune-Albright syndrome, progressive osseus heteroplasia, polyostotic fibrous dysplasia of bone, and some pituitary tumors. GNAS ENSG00000087460 NA
suppressor of glucose, autophagy associated 1 140710 NA SOGA1 ENSG00000149639 NA
surfactant protein A2 729238 This gene is one of several genes encoding pulmonary-surfactant associated proteins (SFTPA) located on chromosome 10. Mutations in this gene and a highly similar gene located nearby, which affect the highly conserved carbohydrate recognition domain, are associated with idiopathic pulmonary fibrosis. The current version of the assembly displays only a single centromeric SFTPA gene pair rather than the two gene pairs shown in the previous assembly which were thought to have resulted from a duplication. SFTPA2 ENSG00000185303 NA
chromogranin A 1113 The protein encoded by this gene is a member of the chromogranin/secretogranin family of neuroendocrine secretory proteins. It is found in secretory vesicles of neurons and endocrine cells. This gene product is a precursor to three biologically active peptides; vasostatin, pancreastatin, and parastatin. These peptides act as autocrine or paracrine negative modulators of the neuroendocrine system. Two other peptides, catestatin and chromofungin, have antimicrobial activity and antifungal activity, respectively. Two transcript variants encoding different isoforms have been found for this gene. CHGA ENSG00000100604 NA
tetraspanin 9 10867 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. Alternatively spliced transcripts encoding the same protein have been identified. TSPAN9 ENSG00000011105 NA
VIM antisense RNA 1 100507347 NA VIM-AS1 ENSG00000229124 NA
NA NA NA NA ENSG00000117289 TRUE
microtubule associated monooxygenase, calponin and LIM domain containing 2 9645 NA MICAL2 ENSG00000133816 NA
FK506 binding protein 5 2289 The protein encoded by this gene is a member of the immunophilin protein family, which play a role in immunoregulation and basic cellular processes involving protein folding and trafficking. This encoded protein is a cis-trans prolyl isomerase that binds to the immunosuppressants FK506 and rapamycin. It is thought to mediate calcineurin inhibition. It also interacts functionally with mature hetero-oligomeric progesterone receptor complexes along with the 90 kDa heat shock protein and P23 protein. This gene has been found to have multiple polyadenylation sites. Alternative splicing results in multiple transcript variants. FKBP5 ENSG00000096060 NA
keratin 14 3861 This gene encodes a member of the keratin family, the most diverse group of intermediate filaments. This gene product, a type I keratin, is usually found as a heterotetramer with two keratin 5 molecules, a type II keratin. Together they form the cytoskeleton of epithelial cells. Mutations in the genes for these keratins are associated with epidermolysis bullosa simplex. At least one pseudogene has been identified at 17p12-p11. KRT14 ENSG00000186847 NA
MCF.2 cell line derived transforming sequence like 23263 This gene encodes a guanine nucleotide exchange factor that interacts specifically with the GTP-bound Rac1 and plays a role in the Rho/Rac signaling pathways. A variant in this gene was associated with osteoarthritis. Alternative splicing results in multiple transcript variants. MCF2L ENSG00000126217 NA
PTPRF interacting protein alpha 4 8497 PPFIA4, or liprin-alpha-4, belongs to the liprin-alpha gene family. See liprin-alpha-1 (LIP1, or PPFIA1; MIM 611054) for background on liprins. PPFIA4 ENSG00000143847 NA
transducin like enhancer of split 2 7089 NA TLE2 ENSG00000065717 NA
surfactant protein C 6440 This gene encodes the pulmonary-associated surfactant protein C (SPC), an extremely hydrophobic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 2, also called pulmonary alveolar proteinosis due to surfactant protein C deficiency, and are associated with interstitial lung disease in older infants, children, and adults. Alternatively spliced transcript variants encoding different protein isoforms have been identified. SFTPC ENSG00000168484 NA
surfactant protein A1 653509 This gene encodes a lung surfactant protein that is a member of a subfamily of C-type lectins called collectins. The encoded protein binds specific carbohydrate moieties found on lipids and on the surface of microorganisms. This protein plays an essential role in surfactant homeostasis and in the defense against respiratory pathogens. Mutations in this gene are associated with idiopathic pulmonary fibrosis. Alternate splicing results in multiple transcript variants. SFTPA1 ENSG00000122852 NA
NA ENSG00000234961 NA RP11-124N14.3 ENSG00000234961 NA
ERBB receptor feedback inhibitor 1 54206 ERRFI1 is a cytoplasmic protein whose expression is upregulated with cell growth (Wick et al., 1995 [PubMed 7641805]). It shares significant homology with the protein product of rat gene-33, which is induced during cell stress and mediates cell signaling (Makkinje et al., 2000 [PubMed 10749885]; Fiorentino et al., 2000 [PubMed 11003669]). ERRFI1 ENSG00000116285 NA
FK506 binding protein 8 23770 The protein encoded by this gene is a member of the immunophilin protein family, which play a role in immunoregulation and basic cellular processes involving protein folding and trafficking. Unlike the other members of the family, this encoded protein does not seem to have PPIase/rotamase activity. It may have a role in neurons associated with memory function. FKBP8 ENSG00000105701 NA
pyruvate kinase, muscle 5315 This gene encodes a protein involved in glycolysis. The encoded protein is a pyruvate kinase that catalyzes the transfer of a phosphoryl group from phosphoenolpyruvate to ADP, generating ATP and pyruvate. This protein has been shown to interact with thyroid hormone and may mediate cellular metabolic effects induced by thyroid hormones. This protein has been found to bind Opa protein, a bacterial outer membrane protein involved in gonococcal adherence to and invasion of human cells, suggesting a role of this protein in bacterial pathogenesis. Several alternatively spliced transcript variants encoding a few distinct isoforms have been reported. PKM ENSG00000067225 NA
Thy-1 cell surface antigen 7070 This gene encodes a cell surface glycoprotein and member of the immunoglobulin superfamily of proteins. The encoded protein is involved in cell adhesion and cell communication in numerous cell types, but particularly in cells of the immune and nervous systems. The encoded protein is widely used as a marker for hematopoietic stem cells. This gene may function as a tumor suppressor in nasopharyngeal carcinoma. Alternative splicing results in multiple transcript variants. THY1 ENSG00000154096 NA
natriuretic peptide A 4878 The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. NPPA ENSG00000175206 NA
fatty acid synthase 2194 The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. FASN ENSG00000169710 NA
ALS2, alsin Rho guanine nucleotide exchange factor 57679 The protein encoded by this gene contains an ATS1/RCC1-like domain, a RhoGEF domain, and a vacuolar protein sorting 9 (VPS9) domain, all of which are guanine-nucleotide exchange factors that activate members of the Ras superfamily of GTPases. The protein functions as a guanine nucleotide exchange factor for the small GTPase RAB5. The protein localizes with RAB5 on early endosomal compartments, and functions as a modulator for endosomal dynamics. Mutations in this gene result in several forms of juvenile lateral sclerosis and infantile-onset ascending spastic paralysis. Multiple transcript variants encoding different isoforms have been found for this gene. ALS2 ENSG00000003393 NA
heat shock protein 90kDa alpha family class A member 1 3320 The protein encoded by this gene is an inducible molecular chaperone that functions as a homodimer. The encoded protein aids in the proper folding of specific target proteins by use of an ATPase activity that is modulated by co-chaperones. Two transcript variants encoding different isoforms have been found for this gene. HSP90AA1 ENSG00000080824 NA
collagen type XXVII alpha 1 85301 This gene encodes a member of the fibrillar collagen family, and plays a role during the calcification of cartilage and the transition of cartilage to bone. The encoded protein product is a preproprotein. It includes an N-terminal signal peptide, which is followed by an N-terminal propetide, mature peptide and a C-terminal propeptide. The N-terminal propeptide contains thrombospondin N-terminal-like and laminin G-like domains. The mature peptide is a major triple-helical region. The C-terminal propeptide, also known as COLFI domain, plays crucial roles in tissue growth and repair. Mutations in this gene cause Steel syndrome. Alternatively spliced transcript variants have been found, but the full-length nature of some variants has not been determined. COL27A1 ENSG00000196739 NA
suprabasin 374897 NA SBSN ENSG00000189001 NA
forkhead box N3 1112 This gene is a member of the forkhead/winged helix transcription factor family. Checkpoints are eukaryotic DNA damage-inducible cell cycle arrests at G1 and G2. Checkpoint suppressor 1 suppresses multiple yeast checkpoint mutations including mec1, rad9, rad53 and dun1 by activating a MEC1-independent checkpoint pathway. Alternative splicing is observed at the locus, resulting in distinct isoforms. FOXN3 ENSG00000053254 NA
basic helix-loop-helix family member e40 8553 This gene encodes a basic helix-loop-helix protein expressed in various tissues. The encoded protein can interact with ARNTL or compete for E-box binding sites in the promoter of PER1 and repress CLOCK/ARNTL’s transactivation of PER1. This gene is believed to be involved in the control of circadian rhythm and cell differentiation. BHLHE40 ENSG00000134107 NA
pyruvate dehydrogenase kinase 4 5166 This gene is a member of the PDK/BCKDK protein kinase family and encodes a mitochondrial protein with a histidine kinase domain. This protein is located in the matrix of the mitrochondria and inhibits the pyruvate dehydrogenase complex by phosphorylating one of its subunits, thereby contributing to the regulation of glucose metabolism. Expression of this gene is regulated by glucocorticoids, retinoic acid and insulin. PDK4 ENSG00000004799 NA
integrin subunit alpha 5 3678 The product of this gene belongs to the integrin alpha chain family. Integrins are heterodimeric integral membrane proteins composed of an alpha subunit and a beta subunit that function in cell surface adhesion and signaling. The encoded preproprotein is proteolytically processed to generate light and heavy chains that comprise the alpha 5 subunit. This subunit associates with the beta 1 subunit to form a fibronectin receptor. This integrin may promote tumor invasion, and higher expression of this gene may be correlated with shorter survival time in lung cancer patients. Note that the integrin alpha 5 and integrin alpha V subunits are encoded by distinct genes. ITGA5 ENSG00000161638 NA
BAI1 associated protein 3 8938 This p53-target gene encodes a brain-specific angiogenesis inhibitor. The protein is a seven-span transmembrane protein and a member of the secretin receptor family. It interacts with the cytoplasmic region of brain-specific angiogenesis inhibitor 1. This protein also contains two C2 domains, which are often found in proteins involved in signal transduction or membrane trafficking. Its expression pattern and similarity to other proteins suggest that it may be involved in synaptic functions. Several transcript variants encoding different isoforms have been found for this gene. BAIAP3 ENSG00000007516 NA
laminin subunit alpha 5 3911 This gene encodes one of the vertebrate laminin alpha chains. Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. The protein encoded by this gene is the alpha-5 subunit of of laminin-10 (laminin-511), laminin-11 (laminin-521) and laminin-15 (laminin-523). LAMA5 ENSG00000130702 NA
PATJ, crumbs cell polarity complex component 10207 This gene encodes a protein with multiple PDZ domains. PDZ domains mediate protein-protein interactions, and proteins with multiple PDZ domains often organize multimeric complexes at the plasma membrane. This protein localizes to tight junctions and to the apical membrane of epithelial cells. A similar protein in Drosophila is a scaffolding protein which tethers several members of a multimeric signaling complex in photoreceptors. PATJ ENSG00000132849 NA
transglutaminase 2 7052 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene acts as a monomer, is induced by retinoic acid, and appears to be involved in apoptosis. Finally, the encoded protein is the autoantigen implicated in celiac disease. Two transcript variants encoding different isoforms have been found for this gene. TGM2 ENSG00000198959 NA
protein kinase (cAMP-dependent, catalytic) inhibitor beta 5570 This gene encodes a member of the cAMP-dependent protein kinase inhibitor family. The encoded protein may play a role in the protein kinase A (PKA) pathway by interacting with the catalytic subunit of PKA, and overexpression of this gene may play a role in prostate cancer. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. PKIB ENSG00000135549 NA
ornithine decarboxylase antizyme 1 4946 The protein encoded by this gene belongs to the ornithine decarboxylase antizyme family, which plays a role in cell growth and proliferation by regulating intracellular polyamine levels. Expression of antizymes requires +1 ribosomal frameshifting, which is enhanced by high levels of polyamines. Antizymes in turn bind to and inhibit ornithine decarboxylase (ODC), the key enzyme in polyamine biosynthesis; thus, completing the auto-regulatory circuit. This gene encodes antizyme 1, the first member of the antizyme family, that has broad tissue distribution, and negatively regulates intracellular polyamine levels by binding to and targeting ODC for degradation, as well as inhibiting polyamine uptake. Antizyme 1 mRNA contains two potential in-frame AUGs; and studies in rat suggest that alternative use of the two translation initiation sites results in N-terminally distinct protein isoforms with different subcellular localization. Alternatively spliced transcript variants have also been noted for this gene. OAZ1 ENSG00000104904 NA
dermokine 93099 This gene is upregulated in inflammatory diseases, and it was first observed as expressed in the differentiated layers of skin. The most interesting aspect of this gene is the differential use of promoters and terminators to generate isoforms with unique cellular distributions and domain components. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. DMKN ENSG00000161249 NA
calcium/calmodulin dependent protein kinase II inhibitor 1 55450 NA CAMK2N1 ENSG00000162545 NA
chromodomain helicase DNA binding protein 7 55636 This gene encodes a protein that contains several helicase family domains. Mutations in this gene have been found in some patients with the CHARGE syndrome. Two transcript variants encoding different isoforms have been found for this gene. CHD7 ENSG00000171316 NA
calmodulin like 5 51806 This gene encodes a novel calcium binding protein expressed in the epidermis and related to the calmodulin family of calcium binding proteins. Functional studies with recombinant protein demonstrate it does bind calcium and undergoes a conformational change when it does so. Abundant expression is detected only in reconstructed epidermis and is restricted to differentiating keratinocytes. In addition, it can associate with transglutaminase 3, shown to be a key enzyme in the terminal differentiation of keratinocytes. CALML5 ENSG00000178372 NA
protein disulfide isomerase family A member 2 64714 Protein disulfide isomerases (EC 5.3.4.1), such as PDIP, are endoplasmic reticulum (ER) resident proteins that catalyze protein folding and thiol-disulfide interchange reactions (Desilva et al., 1996 [PubMed 8561901]). PDIA2 ENSG00000185615 NA
plakophilin 1 5317 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This protein may be involved in molecular recruitment and stabilization during desmosome formation. Mutations in this gene have been associated with the ectodermal dysplasia/skin fragility syndrome. Two transcript variants encoding different isoforms have been found for this gene. PKP1 ENSG00000081277 NA
aldolase, fructose-bisphosphate A 226 The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10. ALDOA ENSG00000149925 NA
proline rich coiled-coil 2A 7916 A cluster of genes, BAT1-BAT5, has been localized in the vicinity of the genes for TNF alpha and TNF beta. These genes are all within the human major histocompatibility complex class III region. This gene has microsatellite repeats which are associated with the age-at-onset of insulin-dependent diabetes mellitus (IDDM) and possibly thought to be involved with the inflammatory process of pancreatic beta-cell destruction during the development of IDDM. This gene is also a candidate gene for the development of rheumatoid arthritis. Two transcript variants encoding the same protein have been found for this gene. PRRC2A ENSG00000204469 NA
solute carrier family 38 member 1 81539 Amino acid transporters play essential roles in the uptake of nutrients, production of energy, chemical metabolism, detoxification, and neurotransmitter cycling. SLC38A1 is an important transporter of glutamine, an intermediate in the detoxification of ammonia and the production of urea. Glutamine serves as a precursor for the synaptic transmitter, glutamate (Gu et al., 2001 [PubMed 11325958]). SLC38A1 ENSG00000111371 NA
adenosine deaminase, RNA specific B1 104 This gene encodes the enzyme responsible for pre-mRNA editing of the glutamate receptor subunit B by site-specific deamination of adenosines. Studies in rat found that this enzyme acted on its own pre-mRNA molecules to convert an AA dinucleotide to an AI dinucleotide which resulted in a new splice site. Alternative splicing of this gene results in several transcript variants, some of which have been characterized by the presence or absence of an ALU cassette insert and a short or long C-terminal region. ADARB1 ENSG00000197381 NA
heat shock protein family B (small) member 7 27129 NA HSPB7 ENSG00000173641 NA
tropomyosin 3 7170 This gene encodes a member of the tropomyosin family of actin-binding proteins. Tropomyosins are dimers of coiled-coil proteins that provide stability to actin filaments and regulate access of other actin-binding proteins. Mutations in this gene result in autosomal dominant nemaline myopathy and other muscle disorders. This locus is involved in translocations with other loci, including anaplastic lymphoma receptor tyrosine kinase (ALK) and neurotrophic tyrosine kinase receptor type 1 (NTRK1), which result in the formation of fusion proteins that act as oncogenes. There are numerous pseudogenes for this gene on different chromosomes. Alternative splicing results in multiple transcript variants. TPM3 ENSG00000143549 NA
ubiquitin C-terminal hydrolase L1 7345 The protein encoded by this gene belongs to the peptidase C12 family. This enzyme is a thiol protease that hydrolyzes a peptide bond at the C-terminal glycine of ubiquitin. This gene is specifically expressed in the neurons and in cells of the diffuse neuroendocrine system. Mutations in this gene may be associated with Parkinson disease. UCHL1 ENSG00000154277 NA
loricrin 4014 This gene encodes loricrin, a major protein component of the cornified cell envelope found in terminally differentiated epidermal cells. Mutations in this gene are associated with Vohwinkel’s syndrome and progressive symmetric erythrokeratoderma, both inherited skin diseases. LOR ENSG00000203782 NA
NA ENSG00000271795 NA CTC-251D13.1 ENSG00000271795 NA
tyrosine kinase non receptor 2 10188 This gene encodes a tyrosine kinase that binds Cdc42Hs in its GTP-bound form and inhibits both the intrinsic and GTPase-activating protein (GAP)-stimulated GTPase activity of Cdc42Hs. This binding is mediated by a unique sequence of 47 amino acids C-terminal to an SH3 domain. The protein may be involved in a regulatory mechanism that sustains the GTP-bound active form of Cdc42Hs and which is directly linked to a tyrosine phosphorylation signal transduction pathway. Several alternatively spliced transcript variants have been identified from this gene, but the full-length nature of only two transcript variants has been determined. TNK2 ENSG00000061938 NA
regulator of G-protein signaling 14 10636 This gene encodes a member of the regulator of G-protein signaling family. This protein contains one RGS domain, two Raf-like Ras-binding domains (RBDs), and one GoLoco domain. The protein attenuates the signaling activity of G-proteins by binding, through its GoLoco domain, to specific types of activated, GTP-bound G alpha subunits. Acting as a GTPase activating protein (GAP), the protein increases the rate of conversion of the GTP to GDP. This hydrolysis allows the G alpha subunits to bind G beta/gamma subunit heterodimers, forming inactive G-protein heterotrimers, thereby terminating the signal. Alternate transcriptional splice variants of this gene have been observed but have not been thoroughly characterized. RGS14 ENSG00000169220 NA
fibrosin 64319 Fibrosin is a lymphokine secreted by activated lymphocytes that induces fibroblast proliferation (Prakash and Robbins, 1998 [PubMed 9809749]). FBRS ENSG00000156860 NA
microtubule associated serine/threonine kinase 3 23031 NA MAST3 ENSG00000099308 NA
CAP-Gly domain containing linker protein 1 6249 The protein encoded by this gene links endocytic vesicles to microtubules. This gene is highly expressed in Reed-Sternberg cells of Hodgkin disease. Several transcript variants encoding different isoforms have been found for this gene. CLIP1 ENSG00000130779 NA
metallothionein 3 4504 NA MT3 ENSG00000087250 NA
Kruppel like factor 9 687 The protein encoded by this gene is a transcription factor that binds to GC box elements located in the promoter. Binding of the encoded protein to a single GC box inhibits mRNA expression while binding to tandemly repeated GC box elements activates transcription. KLF9 ENSG00000119138 NA
low density lipoprotein receptor adaptor protein 1 26119 The protein encoded by this gene is a cytosolic protein which contains a phosphotyrosine binding (PTD) domain. The PTD domain has been found to interact with the cytoplasmic tail of the LDL receptor. Mutations in this gene lead to LDL receptor malfunction and cause the disorder autosomal recessive hypercholesterolaemia. LDLRAP1 ENSG00000157978 NA
elastin 2006 This gene encodes a protein that is one of the two components of elastic fibers. The encoded protein is rich in hydrophobic amino acids such as glycine and proline, which form mobile hydrophobic regions bounded by crosslinks between lysine residues. Deletions and mutations in this gene are associated with supravalvular aortic stenosis (SVAS) and autosomal dominant cutis laxa. Multiple transcript variants encoding different isoforms have been found for this gene. ELN ENSG00000049540 NA
ATH1, acid trehalase-like 1 (yeast) 80162 NA ATHL1 ENSG00000142102 NA
FXYD domain containing ion transport regulator 7 53822 This reference sequence was derived from multiple replicate ESTs and validated by similar human genomic sequence. This gene encodes a member of a family of small membrane proteins that share a 35-amino acid signature sequence domain, beginning with the sequence PFXYD and containing 7 invariant and 6 highly conserved amino acids. The approved human gene nomenclature for the family is FXYD-domain containing ion transport regulator. Transmembrane topology has been established for two family members (FXYD1 and FXYD2), with the N-terminus extracellular and the C-terminus on the cytoplasmic side of the membrane. FXYD2, also known as the gamma subunit of the Na,K-ATPase, regulates the properties of that enzyme. FXYD1 (phospholemman), FXYD2 (gamma), FXYD3 (MAT-8), FXYD4 (CHIF), and FXYD5 (RIC) have been shown to induce channel activity in experimental expression systems. This gene product, FXYD7, is novel and has not been characterized as a protein. [RefSeq curation by Kathleen J. Sweadner, Ph.D., sweadner@helix.mgh.harvard.edu., Dec 2000]. FXYD7 ENSG00000221946 NA
calmodulin 2 (phosphorylase kinase, delta) 805 This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. CALM2 ENSG00000143933 NA
major histocompatibility complex, class I, B 3106 HLA-B belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon 1 encodes the leader peptide, exon 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Hundreds of HLA-B alleles have been described. HLA-B ENSG00000234745 NA
murine retrovirus integration site 1 homolog 10335 This gene is similar to a putative mouse tumor suppressor gene (Mrvi1) that is frequently disrupted by mouse AIDS-related virus (MRV). The encoded protein, which is found in the membrane of the endoplasmic reticulum, is similar to Jaw1, a lymphoid-restricted protein whose expression is down-regulated during lymphoid differentiation. This protein is a substrate of cGMP-dependent kinase-1 (PKG1) that can function as a regulator of IP3-induced calcium release. Studies in mouse suggest that MRV integration at Mrvi1 induces myeloid leukemia by altering the expression of a gene important for myeloid cell growth and/or differentiation, and thus this gene may function as a myeloid leukemia tumor suppressor gene. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, and alternative translation start sites, including a non-AUG (CUG) start site, are used. MRVI1 ENSG00000072952 NA
interferon induced transmembrane protein 3 10410 The protein encoded by this gene is an interferon-induced membrane protein that helps confer immunity to influenza A H1N1 virus, West Nile virus, and dengue virus. Two transcript variants, only one of them protein-coding, have been found for this gene. Another variant encoding an N-terminally truncated isoform has been reported, but the full-length nature of this variant has not been determined. IFITM3 ENSG00000142089 NA
metallothionein 1X 4501 NA MT1X ENSG00000187193 NA
transmembrane protein 178A 130733 NA TMEM178A ENSG00000152154 NA
intercellular adhesion molecule 5 7087 The protein encoded by this gene is a member of the intercellular adhesion molecule (ICAM) family. All ICAM proteins are type I transmembrane glycoproteins, contain 2-9 immunoglobulin-like C2-type domains, and bind to the leukocyte adhesion LFA-1 protein. This protein is expressed on the surface of telencephalic neurons and displays two types of adhesion activity, homophilic binding between neurons and heterophilic binding between neurons and leukocytes. It may be a critical component in neuron-microglial cell interactions in the course of normal development or as part of neurodegenerative diseases. ICAM5 ENSG00000105376 NA
basigin (Ok blood group) 682 The protein encoded by this gene is a plasma membrane protein that is important in spermatogenesis, embryo implantation, neural network formation, and tumor progression. The encoded protein is also a member of the immunoglobulin superfamily. Multiple transcript variants encoding different isoforms have been found for this gene. BSG ENSG00000172270 NA
collagen type VI alpha 2 1292 This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. COL6A2 ENSG00000142173 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",11,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 12 Annotations

out <- mygene::queryMany(gene_list[12,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary X_id query symbol name
This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. 7431 ENSG00000026025 VIM vimentin
This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. 1674 ENSG00000175084 DES desmin
This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. 1292 ENSG00000142173 COL6A2 collagen type VI alpha 2
Protamines substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis, and are the major DNA-binding proteins in the nucleus of sperm in many vertebrates. They package the sperm DNA into a highly condensed complex in a volume less than 5% of a somatic cell nucleus. Many mammalian species have only one protamine (protamine 1); however, a few species, including human and mouse, have two. This gene encodes protamine 2, which is cleaved to give rise to a family of protamine 2 peptides. Alternatively spliced transcript variants have also been found for this gene. 5620 ENSG00000122304 PRM2 protamine 2
This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. 3858 ENSG00000186395 KRT10 keratin 10
This gene encodes the light subunit of the ferritin protein. Ferritin is the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in this light chain ferritin gene are associated with several neurodegenerative diseases and hyperferritinemia-cataract syndrome. This gene has multiple pseudogenes. 2512 ENSG00000087086 FTL ferritin, light polypeptide
Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. 4624 ENSG00000197616 MYH6 myosin, heavy chain 6, cardiac muscle, alpha
This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants. 1832 ENSG00000096696 DSP desmoplakin
NA 64065 ENSG00000112378 PERP PERP, TP53 apoptosis effector
NA 5619 ENSG00000175646 PRM1 protamine 1
This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. 1634 ENSG00000011465 DCN decorin
The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. 3043 ENSG00000244734 HBB hemoglobin subunit beta
This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This protein may be involved in molecular recruitment and stabilization during desmosome formation. Mutations in this gene have been associated with the ectodermal dysplasia/skin fragility syndrome. Two transcript variants encoding different isoforms have been found for this gene. 5317 ENSG00000081277 PKP1 plakophilin 1
Fibulin 1 is a secreted glycoprotein that becomes incorporated into a fibrillar extracellular matrix. Calcium-binding is apparently required to mediate its binding to laminin and nidogen. It mediates platelet adhesion via binding fibrinogen. Four splice variants which differ in the 3’ end have been identified. Each variant encodes a different isoform, but no functional distinctions have been identified among the four variants. 2192 ENSG00000077942 FBLN1 fibulin 1
This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1281 ENSG00000168542 COL3A1 collagen type III alpha 1 chain
This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1277 ENSG00000108821 COL1A1 collagen type I alpha 1
The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. 4878 ENSG00000175206 NPPA natriuretic peptide A
This gene encodes a member of the keratin family, the most diverse group of intermediate filaments. This gene product, a type I keratin, is usually found as a heterotetramer with two keratin 5 molecules, a type II keratin. Together they form the cytoskeleton of epithelial cells. Mutations in the genes for these keratins are associated with epidermolysis bullosa simplex. At least one pseudogene has been identified at 17p12-p11. 3861 ENSG00000186847 KRT14 keratin 14
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the basal layer of the epidermis with family member KRT14. Mutations in these genes have been associated with a complex of diseases termed epidermolysis bullosa simplex. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3852 ENSG00000186081 KRT5 keratin 5
The secreted protein encoded by this gene is growth factor-inducible and promotes the adhesion of endothelial cells. The encoded protein interacts with several integrins and with heparan sulfate proteoglycan. This protein also plays a role in cell proliferation, differentiation, angiogenesis, apoptosis, and extracellular matrix formation. 3491 ENSG00000142871 CYR61 cysteine rich angiogenic inducer 61
This gene encodes a major cytoplasmic protein which is the only known constituent common to submembranous plaques of both desmosomes and intermediate junctions. This protein forms distinct complexes with cadherins and desmosomal cadherins and is a member of the catenin family since it contains a distinct repeating amino acid motif called the armadillo repeat. Mutation in this gene has been associated with Naxos disease. Alternative splicing occurs in this gene; however, not all transcripts have been fully described. 3728 ENSG00000173801 JUP junction plakoglobin
NA 374897 ENSG00000189001 SBSN suprabasin
NA 100507347 ENSG00000229124 VIM-AS1 VIM antisense RNA 1
This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. 1293 ENSG00000163359 COL6A3 collagen type VI alpha 3 chain
The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. 4256 ENSG00000111341 MGP matrix Gla protein
NA ENSG00000237973 ENSG00000237973 MTCO1P12 MT-CO1 pseudogene 12
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3849 ENSG00000172867 KRT2 keratin 2
This gene encodes a highly glycosylated plasma protein involved in the regulation of the complement cascade. Its protein inhibits activated C1r and C1s of the first complement component and thus regulates complement activation. Deficiency of this protein is associated with hereditary angioneurotic oedema (HANE). Alternative splicing results in multiple transcript variants encoding the same isoform. 710 ENSG00000149131 SERPING1 serpin family G member 1
This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. 2318 ENSG00000128591 FLNC filamin C
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3848 ENSG00000167768 KRT1 keratin 1
This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. 60 ENSG00000075624 ACTB actin, beta
NA ENSG00000234961 ENSG00000234961 RP11-124N14.3 NA
This gene encodes a protein with similarity to follistatin, an activin-binding protein. It contains an FS module, a follistatin-like sequence containing 10 conserved cysteine residues. This gene product is thought to be an autoantigen associated with rheumatoid arthritis. 11167 ENSG00000163430 FSTL1 follistatin like 1
The cytoplasmic peripheral membrane protein encoded by this gene functions as a protein-tyrosine kinase substrate in microvilli. As a member of the ERM protein family, this protein serves as an intermediate between the plasma membrane and the actin cytoskeleton. This protein plays a key role in cell surface structure adhesion, migration and organization, and it has been implicated in various human cancers. A pseudogene located on chromosome 3 has been identified for this gene. Alternatively spliced variants have also been described for this gene. 7430 ENSG00000092820 EZR ezrin
This gene encodes a protein that enables the dissociation of paused ternary polymerase I transcription complexes from the 3’ end of pre-rRNA transcripts. This protein regulates rRNA transcription by promoting the dissociation of transcription complexes and the reinitiation of polymerase I on nascent rRNA transcripts. This protein also localizes to caveolae at the plasma membrane and is thought to play a critical role in the formation of caveolae and the stabilization of caveolins. This protein translocates from caveolae to the cytoplasm after insulin stimulation. Caveolae contain truncated forms of this protein and may be the site of phosphorylation-dependent proteolysis. This protein is also thought to modify lipid metabolism and insulin-regulated gene expression. Mutations in this gene result in a disorder characterized by generalized lipodystrophy and muscular dystrophy. 284119 ENSG00000177469 PTRF polymerase I and transcript release factor
Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). 70 ENSG00000159251 ACTC1 actin, alpha, cardiac muscle 1
NA 27129 ENSG00000173641 HSPB7 heat shock protein family B (small) member 7
This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein binds both insulin-like growth factors (IGFs) I and II and circulates in the plasma in both glycosylated and non-glycosylated forms. Binding of this protein prolongs the half-life of the IGFs and alters their interaction with cell surface receptors. 3487 ENSG00000141753 IGFBP4 insulin like growth factor binding protein 4
This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. 4151 ENSG00000198125 MB myoglobin
Spectrins are principle components of a cell’s membrane-cytoskeleton and are composed of two alpha and two beta spectrin subunits. The protein encoded by this gene (SPTBN2), is called spectrin beta non-erythrocytic 2 or beta-III spectrin. It is related to, but distinct from, the beta-II spectrin gene which is also known as spectrin beta non-erythrocytic 1 (SPTBN1). SPTBN2 regulates the glutamate signaling pathway by stabilizing the glutamate transporter EAAT4 at the surface of the plasma membrane. Mutations in this gene cause a form of spinocerebellar ataxia, SCA5, that is characterized by neurodegeneration, progressive locomotor incoordination, dysarthria, and uncoordinated eye movements. 6712 ENSG00000173898 SPTBN2 spectrin beta, non-erythrocytic 2
NA ENSG00000225630 ENSG00000225630 MTND2P28 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28
The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. 6280 ENSG00000163220 S100A9 S100 calcium binding protein A9
NA 715 ENSG00000159403 C1R complement C1r subcomponent
NA ENSG00000211895 ENSG00000211895 IGHA1 immunoglobulin heavy constant alpha 1
Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. 1410 ENSG00000109846 CRYAB crystallin alpha B
This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. 2335 ENSG00000115414 FN1 fibronectin 1
This gene product belongs to the 14-3-3 family of proteins which mediate signal transduction by binding to phosphoserine-containing proteins. This highly conserved protein family is found in both plants and mammals, and this protein is 100% identical to the mouse ortholog. It interacts with CDC25 phosphatases, RAF1 and IRS1 proteins, suggesting its role in diverse biochemical activities related to signal transduction, such as cell division and regulation of insulin sensitivity. It has also been implicated in the pathogenesis of small cell lung cancer. Two transcript variants, one protein-coding and the other non-protein-coding, have been found for this gene. 7531 ENSG00000108953 YWHAE tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein epsilon
This gene is a member of the matrix metalloproteinase (MMP) gene family, that are zinc-dependent enzymes capable of cleaving components of the extracellular matrix and molecules involved in signal transduction. The protein encoded by this gene is a gelatinase A, type IV collagenase, that contains three fibronectin type II repeats in its catalytic site that allow binding of denatured type IV and V collagen and elastin. Unlike most MMP family members, activation of this protein can occur on the cell membrane. This enzyme can be activated extracellularly by proteases, or, intracellulary by its S-glutathiolation with no requirement for proteolytical removal of the pro-domain. This protein is thought to be involved in multiple pathways including roles in the nervous system, endometrial menstrual breakdown, regulation of vascularization, and metastasis. Mutations in this gene have been associated with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. 4313 ENSG00000087245 MMP2 matrix metallopeptidase 2
This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that may play a role in neurite outgrowth. This gene may be involved in glioblastoma carcinogenesis. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. 57447 ENSG00000165795 NDRG2 NDRG family member 2
Complement component C3 plays a central role in the activation of complement system. Its activation is required for both classical and alternative complement activation pathways. The encoded preproprotein is proteolytically processed to generate alpha and beta subunits that form the mature protein, which is then further processed to generate numerous peptide products. The C3a peptide, also known as the C3a anaphylatoxin, modulates inflammation and possesses antimicrobial activity. Mutations in this gene are associated with atypical hemolytic uremic syndrome and age-related macular degeneration in human patients. 718 ENSG00000125730 C3 complement component 3
This gene encodes a cell surface tyrosine kinase receptor for members of the platelet-derived growth factor family. These growth factors are mitogens for cells of mesenchymal origin. The identity of the growth factor bound to a receptor monomer determines whether the functional receptor is a homodimer or a heterodimer, composed of both platelet-derived growth factor receptor alpha and beta polypeptides. This gene is flanked on chromosome 5 by the genes for granulocyte-macrophage colony-stimulating factor and macrophage-colony stimulating factor receptor; all three genes may be implicated in the 5-q syndrome. A translocation between chromosomes 5 and 12, that fuses this gene to that of the translocation, ETV6, leukemia gene, results in chronic myeloproliferative disorder with eosinophilia. 5159 ENSG00000113721 PDGFRB platelet derived growth factor receptor beta
NA 100129518 ENSG00000112096 LOC100129518 uncharacterized LOC100129518
This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 1. 6648 ENSG00000112096 SOD2 superoxide dismutase 2, mitochondrial
The protein encoded by this gene is a mitogen that is secreted by vascular endothelial cells. The encoded protein plays a role in chondrocyte proliferation and differentiation, cell adhesion in many cell types, and is related to platelet-derived growth factor. Certain polymorphisms in this gene have been linked with a higher incidence of systemic sclerosis. 1490 ENSG00000118523 CTGF connective tissue growth factor
This gene encodes a serine protease, which is a major constituent of the human complement subcomponent C1. C1s associates with two other complement components C1r and C1q in order to yield the first component of the serum complement system. Defects in this gene are the cause of selective C1s deficiency. 716 ENSG00000182326 C1S complement component 1, s subcomponent
The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. 7139 ENSG00000118194 TNNT2 troponin T2, cardiac type
The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. 3860 ENSG00000171401 KRT13 keratin 13
Members of the F-box protein family, such as FBXL16, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). 146330 ENSG00000127585 FBXL16 F-box and leucine rich repeat protein 16
This gene encodes the alpha chain of type XII collagen, a member of the FACIT (fibril-associated collagens with interrupted triple helices) collagen family. Type XII collagen is a homotrimer found in association with type I collagen, an association that is thought to modify the interactions between collagen I fibrils and the surrounding matrix. Alternatively spliced transcript variants encoding different isoforms have been identified. 1303 ENSG00000111799 COL12A1 collagen type XII alpha 1 chain
This gene encodes a PDZ domain-containing protein. PDZ motifs are modular protein-protein interaction domains consisting of 80-120 amino acid residues. PDZ domain-containing proteins interact with each other in cytoskeletal assembly or with other proteins involved in targeting and clustering of membrane proteins. The protein encoded by this gene interacts with alpha-actinin-2 through its N-terminal PDZ domain and with protein kinase C via its C-terminal LIM domains. The LIM domain is a cysteine-rich motif defined by 50-60 amino acids containing two zinc-binding modules. This protein also interacts with all three members of the myozenin family. Mutations in this gene have been associated with myofibrillar myopathy and dilated cardiomyopathy. Alternatively spliced transcript variants encoding different isoforms have been identified; all isoforms have N-terminal PDZ domains while only longer isoforms (1, 2 and 5) have C-terminal LIM domains. 11155 ENSG00000122367 LDB3 LIM domain binding 3
This gene encodes a calmodulin- and actin-binding protein that plays an essential role in the regulation of smooth muscle and nonmuscle contraction. The conserved domain of this protein possesses the binding activities to Ca(2+)-calmodulin, actin, tropomyosin, myosin, and phospholipids. This protein is a potent inhibitor of the actin-tropomyosin activated myosin MgATPase, and serves as a mediating factor for Ca(2+)-dependent inhibition of smooth muscle contraction. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. 800 ENSG00000122786 CALD1 caldesmon 1
This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. The C-terminal portion of the protein, known as canstatin, is an inhibitor of angiogenesis and tumor growth. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. 1284 ENSG00000134871 COL4A2 collagen type IV alpha 2
The product of this gene belongs to the integrin alpha chain family. Integrins are heterodimeric integral membrane proteins composed of an alpha subunit and a beta subunit that function in cell surface adhesion and signaling. The encoded preproprotein is proteolytically processed to generate light and heavy chains that comprise the alpha 5 subunit. This subunit associates with the beta 1 subunit to form a fibronectin receptor. This integrin may promote tumor invasion, and higher expression of this gene may be correlated with shorter survival time in lung cancer patients. Note that the integrin alpha 5 and integrin alpha V subunits are encoded by distinct genes. 3678 ENSG00000161638 ITGA5 integrin subunit alpha 5
The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. 3040 ENSG00000188536 HBA2 hemoglobin subunit alpha 2
NA 151887 ENSG00000091986 CCDC80 coiled-coil domain containing 80
The obscurin gene spans more than 150 kb, contains over 80 exons and encodes a protein of approximately 720 kDa. The encoded protein contains 68 Ig domains, 2 fibronectin domains, 1 calcium/calmodulin-binding domain, 1 RhoGEF domain with an associated PH domain, and 2 serine-threonine kinase domains. This protein belongs to the family of giant sacromeric signaling proteins that includes titin and nebulin, and may have a role in the organization of myofibrils during assembly and may mediate interactions between the sarcoplasmic reticulum and myofibrils. Alternatively spliced transcript variants encoding different isoforms have been identified. 84033 ENSG00000154358 OBSCN obscurin, cytoskeletal calmodulin and titin-interacting RhoGEF
The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and as a cytokine. Altered expression of this protein is associated with the disease cystic fibrosis. Multiple transcript variants encoding different isoforms have been found for this gene. 6279 ENSG00000143546 S100A8 S100 calcium binding protein A8
This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1278 ENSG00000164692 COL1A2 collagen type I alpha 2 chain
The protein encoded by this gene is a member of the interleukin 1 cytokine family. This protein inhibits the activities of interleukin 1, alpha (IL1A) and interleukin 1, beta (IL1B), and modulates a variety of interleukin 1 related immune and inflammatory responses. This gene and five other closely related cytokine genes form a gene cluster spanning approximately 400 kb on chromosome 2. A polymorphism of this gene is reported to be associated with increased risk of osteoporotic fractures and gastric cancer. Several alternatively spliced transcript variants encoding distinct isoforms have been reported. 3557 ENSG00000136689 IL1RN interleukin 1 receptor antagonist
The protein encoded by this gene belongs to the TRIM protein family. It has multiple zinc finger motifs and a leucine zipper motif. It has been proposed to form homo- or heterodimers which are involved in nucleic acid binding. Thus, it may act as a transcriptional regulatory factor involved in carcinogenesis and/or differentiation. It may also function in the suppression of radiosensitivity since it is associated with ataxia telangiectasia phenotype. 23650 ENSG00000137699 TRIM29 tripartite motif containing 29
This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. 7273 ENSG00000155657 TTN titin
NA 7538 ENSG00000128016 ZFP36 ZFP36 ring finger protein
The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. 6876 ENSG00000149591 TAGLN transgelin
This gene encodes an RGD-containing protein that binds to type I, II and IV collagens. The RGD motif is found in many extracellular matrix proteins modulating cell adhesion and serves as a ligand recognition sequence for several integrins. This protein plays a role in cell-collagen interactions and may be involved in endochondrial bone formation in cartilage. The protein is induced by transforming growth factor-beta and acts to inhibit cell adhesion. Mutations in this gene are associated with multiple types of corneal dystrophy. 7045 ENSG00000120708 TGFBI transforming growth factor beta induced
This gene encodes the perlecan protein, which consists of a core protein to which three long chains of glycosaminoglycans (heparan sulfate or chondroitin sulfate) are attached. The perlecan protein is a large multidomain proteoglycan that binds to and cross-links many extracellular matrix components and cell-surface molecules. It has been shown that this protein interacts with laminin, prolargin, collagen type IV, FGFBP1, FBLN2, FGF7 and transthyretin, etc., and it plays essential roles in multiple biological activities. Perlecan is a key component of the vascular extracellular matrix, where it helps to maintain the endothelial barrier function. It is a potent inhibitor of smooth muscle cell proliferation and is thus thought to help maintain vascular homeostasis. It can also promote growth factor (e.g., FGF2) activity and thus stimulate endothelial growth and re-generation. It is a major component of basement membranes, where it is involved in the stabilization of other molecules as well as being involved with glomerular permeability to macromolecules and cell adhesion. Mutations in this gene cause Schwartz-Jampel syndrome type 1, Silverman-Handmaker type of dyssegmental dysplasia, and tardive dyskinesia. Alternative splicing of this gene results in multiple transcript variants. 3339 ENSG00000142798 HSPG2 heparan sulfate proteoglycan 2
C7 is a component of the complement system. It participates in the formation of Membrane Attack Complex (MAC). People with C7 deficiency are prone to bacterial infection. 730 ENSG00000112936 C7 complement component 7
This gene encodes a serine/threonine protein kinase that localizes to mitochondria. It is thought to protect cells from stress-induced mitochondrial dysfunction. Mutations in this gene cause one form of autosomal recessive early-onset Parkinson disease. 65018 ENSG00000158828 PINK1 PTEN induced putative kinase 1
This gene encodes a conventional non-muscle myosin; this protein should not be confused with the unconventional myosin-9a or 9b (MYO9A or MYO9B). The encoded protein is a myosin IIA heavy chain that contains an IQ domain and a myosin head-like domain which is involved in several important functions, including cytokinesis, cell motility and maintenance of cell shape. Defects in this gene have been associated with non-syndromic sensorineural deafness autosomal dominant type 17, Epstein syndrome, Alport syndrome with macrothrombocytopenia, Sebastian syndrome, Fechtner syndrome and macrothrombocytopenia with progressive sensorineural deafness. 4627 ENSG00000100345 MYH9 myosin, heavy chain 9, non-muscle
This gene encodes a member of the heat shock protein 70 family, which contains both heat-inducible and constitutively expressed members. This protein belongs to the latter group, which are also referred to as heat-shock cognate proteins. It functions as a chaperone, and binds to nascent polypeptides to facilitate correct folding. It also functions as an ATPase in the disassembly of clathrin-coated vesicles during transport of membrane components through the cell. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 3312 ENSG00000109971 HSPA8 heat shock protein family A (Hsp70) member 8
This gene encodes a motor protein that transports mitochondria and synaptic vesicle precursors. Mutations in this gene cause Charcot-Marie-Tooth disease, type 2A1. 23095 ENSG00000054523 KIF1B kinesin family member 1B
This gene encodes a cytoskeletal protein that is concentrated in areas of cell-substratum and cell-cell contacts. The encoded protein plays a significant role in the assembly of actin filaments and in spreading and migration of various cell types, including fibroblasts and osteoclasts. It codistributes with integrins in the cell surface membrane in order to assist in the attachment of adherent cells to extracellular matrices and of lymphocytes to other cells. The N-terminus of this protein contains elements for localization to cell-extracellular matrix junctions. The C-terminus contains binding sites for proteins such as beta-1-integrin, actin, and vinculin. 7094 ENSG00000137076 TLN1 talin 1
The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. 27063 ENSG00000148677 ANKRD1 ankyrin repeat domain 1
This gene encodes a member of the phosphatidylethanolamine-binding family of proteins and has been shown to modulate multiple signaling pathways, including the MAP kinase (MAPK), NF-kappa B, and glycogen synthase kinase-3 (GSK-3) signaling pathways. The encoded protein can be further processed to form a smaller cleavage product, hippocampal cholinergic neurostimulating peptide (HCNP), which may be involved in neural development. This gene has been implicated in numerous human cancers and may act as a metastasis suppressor gene. Multiple pseudogenes of this gene have been identified in the genome. 5037 ENSG00000089220 PEBP1 phosphatidylethanolamine binding protein 1
This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 4604 ENSG00000196091 MYBPC1 myosin binding protein C, slow type
This gene is a member of the cytochrome b(561) family that encodes an iron-regulated protein. It highly expressed in the duodenal brush border membrane. It has ferric reductase activity and is believed to play a physiological role in dietary iron absorption. 79901 ENSG00000071967 CYBRD1 cytochrome b reductase 1
NA 58498 ENSG00000106631 MYL7 myosin light chain 7
The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in stimulation of Ca2+-dependent insulin release, stimulation of prolactin secretion, and exocytosis. Chromosomal rearrangements and altered expression of this gene have been implicated in melanoma. 6277 ENSG00000197956 S100A6 S100 calcium binding protein A6
Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. 10398 ENSG00000101335 MYL9 myosin light chain 9
The protein encoded by this gene is a component of desmosomes and of the epidermal cornified envelope in keratinocytes. The N-terminal domain of this protein interacts with the plasma membrane and its C-terminus interacts with intermediate filaments. Through its rod domain, this protein forms complexes with envoplakin. This protein may serve as a link between the cornified envelope and desmosomes as well as intermediate filaments. AKT1/PKB, a protein kinase mediating a variety of cell growth and survival signaling processes, is reported to interact with this protein, suggesting a possible role for this protein as a localization signal in AKT1-mediated signaling. 5493 ENSG00000118898 PPL periplakin
Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. 4625 ENSG00000092054 MYH7 myosin, heavy chain 7, cardiac muscle, beta
This gene encodes a member of the Ser/Thr protein kinase family and the TGFB receptor subfamily. The encoded protein is a transmembrane protein that has a protein kinase domain, forms a heterodimeric complex with another receptor protein, and binds TGF-beta. This receptor/ligand complex phosphorylates proteins, which then enter the nucleus and regulate the transcription of a subset of genes related to cell proliferation. Mutations in this gene have been associated with Marfan Syndrome, Loeys-Deitz Aortic Aneurysm Syndrome, and the development of various types of tumors. Alternatively spliced transcript variants encoding different isoforms have been characterized. 7048 ENSG00000163513 TGFBR2 transforming growth factor beta receptor 2
The scaffolding protein encoded by this gene is the main component of the caveolae plasma membranes found in most cell types. The protein links integrin subunits to the tyrosine kinase FYN, an initiating step in coupling integrins to the Ras-ERK pathway and promoting cell cycle progression. The gene is a tumor suppressor gene candidate and a negative regulator of the Ras-p42/44 mitogen-activated kinase cascade. Caveolin 1 and caveolin 2 are located next to each other on chromosome 7 and express colocalizing proteins that form a stable hetero-oligomeric complex. Mutations in this gene have been associated with Berardinelli-Seip congenital lipodystrophy. Alternatively spliced transcripts encode alpha and beta isoforms of caveolin 1. 857 ENSG00000105974 CAV1 caveolin 1
This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that is required for cell cycle progression and survival in primary astrocytes and may be involved in the regulation of mitogenic signalling in vascular smooth muscles cells. Alternative splicing results in multiple transcripts encoding different isoforms. 65009 ENSG00000103034 NDRG4 NDRG family member 4
NA ENSG00000211896 ENSG00000211896 IGHG1 immunoglobulin heavy constant gamma 1 (G1m marker)
This gene encodes a member of the heat shock protein 90 family; these proteins are involved in signal transduction, protein folding and degradation and morphological evolution. This gene encodes the constitutive form of the cytosolic 90 kDa heat-shock protein and is thought to play a role in gastric apoptosis and inflammation. Alternative splicing results in multiple transcript variants. Pseudogenes have been identified on multiple chromosomes. 3326 ENSG00000096384 HSP90AB1 heat shock protein 90kDa alpha family class B member 1
This gene encodes a member of the fibulin family of extracellular matrix glycoproteins. Like all members of this family, the encoded protein contains tandemly repeated epidermal growth factor-like repeats followed by a C-terminus fibulin-type domain. This gene is upregulated in malignant gliomas and may play a role in the aggressive nature of these tumors. Mutations in this gene are associated with Doyne honeycomb retinal dystrophy. Alternatively spliced transcript variants that encode the same protein have been described. 2202 ENSG00000115380 EFEMP1 EGF containing fibulin like extracellular matrix protein 1
NA 5502 ENSG00000135447 PPP1R1A protein phosphatase 1 regulatory inhibitor subunit 1A
NA 79085 ENSG00000125648 SLC25A23 solute carrier family 25 member 23
The protein encoded by this gene is a homeodomain protein that lacks certain conserved residues required for DNA binding. It was reported that choriocarcinoma cell lines and tissues failed to express this gene, which suggested the possible involvement of this gene in malignant conversion of placental trophoblasts. Studies in mice suggest that this protein may interact with serum response factor (SRF) and modulate SRF-dependent cardiac-specific gene expression and cardiac development. Multiple alternatively spliced transcript variants have been identified for this gene. 84525 ENSG00000171476 HOPX HOP homeobox
NA 6515 ENSG00000059804 SLC2A3 solute carrier family 2 member 3
The protein encoded by this gene is an inducible molecular chaperone that functions as a homodimer. The encoded protein aids in the proper folding of specific target proteins by use of an ATPase activity that is modulated by co-chaperones. Two transcript variants encoding different isoforms have been found for this gene. 3320 ENSG00000080824 HSP90AA1 heat shock protein 90kDa alpha family class A member 1
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",12,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 13 Annotations

out <- mygene::queryMany(gene_list[13,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name query symbol summary X_id
desmin ENSG00000175084 DES This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. 1674
hemoglobin subunit beta ENSG00000244734 HBB The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. 3043
actinin alpha 2 ENSG00000077522 ACTN2 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. 88
keratin 13 ENSG00000171401 KRT13 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. 3860
myosin, heavy chain 7, cardiac muscle, beta ENSG00000092054 MYH7 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. 4625
ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 2 ENSG00000174437 ATP2A2 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol into the sarcoplasmic reticulum lumen, and is involved in regulation of the contraction/relaxation cycle. Mutations in this gene cause Darier-White disease, also known as keratosis follicularis, an autosomal dominant skin disorder characterized by loss of adhesion between epidermal cells and abnormal keratinization. Alternative splicing results in multiple transcript variants encoding different isoforms. 488
actin, alpha 1, skeletal muscle ENSG00000143632 ACTA1 The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. 58
decorin ENSG00000011465 DCN This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. 1634
titin ENSG00000155657 TTN This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. 7273
creatine kinase, M-type ENSG00000104879 CKM The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. 1158
filamin C ENSG00000128591 FLNC This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. 2318
myoglobin ENSG00000198125 MB This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. 4151
actin, beta ENSG00000075624 ACTB This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. 60
fibronectin 1 ENSG00000115414 FN1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. 2335
uncharacterized LOC101927055 ENSG00000237298 LOC101927055 NA 101927055
TTN antisense RNA 1 ENSG00000237298 TTN-AS1 NA 100506866
myosin binding protein C, slow type ENSG00000196091 MYBPC1 This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 4604
keratin 4 ENSG00000170477 KRT4 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3851
Y-box binding protein 3 ENSG00000060138 YBX3 NA 8531
aldolase, fructose-bisphosphate A ENSG00000149925 ALDOA The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10. 226
enolase 3 ENSG00000108515 ENO3 This gene encodes one of the three enolase isoenzymes found in mammals. This isoenzyme is found in skeletal muscle cells in the adult where it may play a role in muscle development and regeneration. A switch from alpha enolase to beta enolase occurs in muscle tissue during development in rodents. Mutations in this gene have be associated glycogen storage disease. Alternatively spliced transcript variants encoding different isoforms have been described. 2027
natriuretic peptide A ENSG00000175206 NPPA The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. 4878
solute carrier family 25 member 4 ENSG00000151729 SLC25A4 This gene is a member of the mitochondrial carrier subfamily of solute carrier protein genes. The product of this gene functions as a gated pore that translocates ADP from the cytoplasm into the mitochondrial matrix and ATP from the mitochondrial matrix into the cytoplasm. The protein forms a homodimer embedded in the inner mitochondria membrane. Mutations in this gene have been shown to result in autosomal dominant progressive external opthalmoplegia and familial hypertrophic cardiomyopathy. 291
actin, alpha, cardiac muscle 1 ENSG00000159251 ACTC1 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). 70
myomesin 2 ENSG00000036448 MYOM2 The giant protein titin, together with its associated proteins, interconnects the major structure of sarcomeres, the M bands and Z discs. The C-terminal end of the titin string extends into the M line, where it binds tightly to M-band constituents of apparent molecular masses of 190 kD and 165 kD. The predicted MYOM2 protein contains 1,465 amino acids. Like MYOM1, MYOM2 has a unique N-terminal domain followed by 12 repeat domains with strong homology to either fibronectin type III or immunoglobulin C2 domains. Protein sequence comparisons suggested that the MYOM2 protein and bovine M protein are identical. 9172
uncharacterized LOC100129518 ENSG00000112096 LOC100129518 NA 100129518
superoxide dismutase 2, mitochondrial ENSG00000112096 SOD2 This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 1. 6648
heat shock protein family B (small) member 7 ENSG00000173641 HSPB7 NA 27129
LIM domain binding 3 ENSG00000122367 LDB3 This gene encodes a PDZ domain-containing protein. PDZ motifs are modular protein-protein interaction domains consisting of 80-120 amino acid residues. PDZ domain-containing proteins interact with each other in cytoskeletal assembly or with other proteins involved in targeting and clustering of membrane proteins. The protein encoded by this gene interacts with alpha-actinin-2 through its N-terminal PDZ domain and with protein kinase C via its C-terminal LIM domains. The LIM domain is a cysteine-rich motif defined by 50-60 amino acids containing two zinc-binding modules. This protein also interacts with all three members of the myozenin family. Mutations in this gene have been associated with myofibrillar myopathy and dilated cardiomyopathy. Alternatively spliced transcript variants encoding different isoforms have been identified; all isoforms have N-terminal PDZ domains while only longer isoforms (1, 2 and 5) have C-terminal LIM domains. 11155
phosphorylase, glycogen, muscle ENSG00000068976 PYGM This gene encodes a muscle enzyme involved in glycogenolysis. Highly similar enzymes encoded by different genes are found in liver and brain. Mutations in this gene are associated with McArdle disease (myophosphorylase deficiency), a glycogen storage disease of muscle. Alternative splicing results in multiple transcript variants. 5837
titin-cap ENSG00000173991 TCAP Sarcomere assembly is regulated by the muscle protein titin. Titin is a giant elastic protein with kinase activity that extends half the length of a sarcomere. It serves as a scaffold to which myofibrils and other muscle related proteins are attached. This gene encodes a protein found in striated and cardiac muscle that binds to the titin Z1-Z2 domains and is a substrate of titin kinase, interactions thought to be critical to sarcomere assembly. Mutations in this gene are associated with limb-girdle muscular dystrophy type 2G. 8557
small proline rich protein 3 ENSG00000163209 SPRR3 NA 6707
myosin, heavy chain 10, non-muscle ENSG00000133026 MYH10 This gene encodes a member of the myosin superfamily. The protein represents a conventional non-muscle myosin; it should not be confused with the unconventional myosin-10 (MYO10). Myosins are actin-dependent motor proteins with diverse functions including regulation of cytokinesis, cell motility, and cell polarity. Mutations in this gene have been associated with May-Hegglin anomaly and developmental defects in brain and heart. Multiple transcript variants encoding different isoforms have been found for this gene. 4628
myomesin 1 ENSG00000101605 MYOM1 The giant protein titin, together with its associated proteins, interconnects the major structure of sarcomeres, the M bands and Z discs. The C-terminal end of the titin string extends into the M line, where it binds tightly to M-band constituents of apparent molecular masses of 190 kD (myomesin 1) and 165 kD (myomesin 2). This protein, myomesin 1, like myomesin 2, titin, and other myofibrillar proteins contains structural modules with strong homology to either fibronectin type III (motif I) or immunoglobulin C2 (motif II) domains. Myomesin 1 and myomesin 2 each have a unique N-terminal region followed by 12 modules of motif I or motif II, in the arrangement II-II-I-I-I-I-I-II-II-II-II-II. The two proteins share 50% sequence identity in this repeat-containing region. The head structure formed by these 2 proteins on one end of the titin string extends into the center of the M band. The integrating structure of the sarcomere arises from muscle-specific members of the superfamily of immunoglobulin-like proteins. Alternatively spliced transcript variants encoding different isoforms have been identified. 8736
thyroglobulin ENSG00000042832 TG Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. 7038
troponin T2, cardiac type ENSG00000118194 TNNT2 The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. 7139
versican ENSG00000038427 VCAN This gene is a member of the aggrecan/versican proteoglycan family. The protein encoded is a large chondroitin sulfate proteoglycan and is a major component of the extracellular matrix. This protein is involved in cell adhesion, proliferation, proliferation, migration and angiogenesis and plays a central role in tissue morphogenesis and maintenance. Mutations in this gene are the cause of Wagner syndrome type 1. Multiple transcript variants encoding different isoforms have been found for this gene. 1462
cardiomyopathy associated 5 ENSG00000164309 CMYA5 NA 202333
pyruvate dehydrogenase kinase 4 ENSG00000004799 PDK4 This gene is a member of the PDK/BCKDK protein kinase family and encodes a mitochondrial protein with a histidine kinase domain. This protein is located in the matrix of the mitrochondria and inhibits the pyruvate dehydrogenase complex by phosphorylating one of its subunits, thereby contributing to the regulation of glucose metabolism. Expression of this gene is regulated by glucocorticoids, retinoic acid and insulin. 5166
actin binding LIM protein 1 ENSG00000099204 ABLIM1 This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. 3983
myosin light chain 2 ENSG00000111245 MYL2 Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. 4633
myosin light chain 3 ENSG00000160808 MYL3 MYL3 encodes myosin light chain 3, an alkali light chain also referred to in the literature as both the ventricular isoform and the slow skeletal muscle isoform. Mutations in MYL3 have been identified as a cause of mid-left ventricular chamber type hypertrophic cardiomyopathy. 4634
fatty acid binding protein 3 ENSG00000121769 FABP3 The intracellular fatty acid-binding proteins (FABPs) belongs to a multigene family. FABPs are divided into at least three distinct types, namely the hepatic-, intestinal- and cardiac-type. They form 14-15 kDa proteins and are thought to participate in the uptake, intracellular metabolism and/or transport of long-chain fatty acids. They may also be responsible in the modulation of cell growth and proliferation. Fatty acid-binding protein 3 gene contains four exons and its function is to arrest growth of mammary epithelial cells. This gene is a candidate tumor suppressor gene for human breast cancer. Alternative splicing results in multiple transcript variants. 2170
NPPA antisense RNA 1 ENSG00000242349 NPPA-AS1 NA ENSG00000242349
heat shock protein family B (small) member 8 ENSG00000152137 HSPB8 The protein encoded by this gene belongs to the superfamily of small heat-shock proteins containing a conservative alpha-crystallin domain at the C-terminal part of the molecule. The expression of this gene in induced by estrogen in estrogen receptor-positive breast cancer cells, and this protein also functions as a chaperone in association with Bag3, a stimulator of macroautophagy. Thus, this gene appears to be involved in regulation of cell proliferation, apoptosis, and carcinogenesis, and mutations in this gene have been associated with different neuromuscular diseases, including Charcot-Marie-Tooth disease. 26353
phosphodiesterase 4D interacting protein ENSG00000178104 PDE4DIP The protein encoded by this gene serves to anchor phosphodiesterase 4D to the Golgi/centrosome region of the cell. Defects in this gene may be a cause of myeloproliferative disorder (MBD) associated with eosinophilia. Several transcript variants encoding different isoforms have been found for this gene. 9659
TIMP metallopeptidase inhibitor 3 ENSG00000100234 TIMP3 This gene belongs to the TIMP gene family. The proteins encoded by this gene family are inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix (ECM). Expression of this gene is induced in response to mitogenic stimulation and this netrin domain-containing protein is localized to the ECM. Mutations in this gene have been associated with the autosomal dominant disorder Sorsby’s fundus dystrophy. 7078
heat shock protein 90kDa alpha family class A member 1 ENSG00000080824 HSP90AA1 The protein encoded by this gene is an inducible molecular chaperone that functions as a homodimer. The encoded protein aids in the proper folding of specific target proteins by use of an ATPase activity that is modulated by co-chaperones. Two transcript variants encoding different isoforms have been found for this gene. 3320
creatine kinase, mitochondrial 2 ENSG00000131730 CKMT2 Mitochondrial creatine kinase (MtCK) is responsible for the transfer of high energy phosphate from mitochondria to the cytosolic carrier, creatine. It belongs to the creatine kinase isoenzyme family. It exists as two isoenzymes, sarcomeric MtCK and ubiquitous MtCK, encoded by separate genes. Mitochondrial creatine kinase occurs in two different oligomeric forms: dimers and octamers, in contrast to the exclusively dimeric cytosolic creatine kinase isoenzymes. Sarcomeric mitochondrial creatine kinase has 80% homology with the coding exons of ubiquitous mitochondrial creatine kinase. This gene contains sequences homologous to several motifs that are shared among some nuclear genes encoding mitochondrial proteins and thus may be essential for the coordinated activation of these genes during mitochondrial biogenesis. Three transcript variants encoding the same protein have been found for this gene. 1160
actin, alpha 2, smooth muscle, aorta ENSG00000107796 ACTA2 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. 59
matrix metallopeptidase 2 ENSG00000087245 MMP2 This gene is a member of the matrix metalloproteinase (MMP) gene family, that are zinc-dependent enzymes capable of cleaving components of the extracellular matrix and molecules involved in signal transduction. The protein encoded by this gene is a gelatinase A, type IV collagenase, that contains three fibronectin type II repeats in its catalytic site that allow binding of denatured type IV and V collagen and elastin. Unlike most MMP family members, activation of this protein can occur on the cell membrane. This enzyme can be activated extracellularly by proteases, or, intracellulary by its S-glutathiolation with no requirement for proteolytical removal of the pro-domain. This protein is thought to be involved in multiple pathways including roles in the nervous system, endometrial menstrual breakdown, regulation of vascularization, and metastasis. Mutations in this gene have been associated with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. 4313
keratin 6A ENSG00000205420 KRT6A The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. As many as six of this type II cytokeratin (KRT6) have been identified; the multiplicity of the genes is attributed to successive gene duplication events. The genes are expressed with family members KRT16 and/or KRT17 in the filiform papillae of the tongue, the stratified epithelial lining of oral mucosa and esophagus, the outer root sheath of hair follicles, and the glandular epithelia. This KRT6 gene in particular encodes the most abundant isoform. Mutations in these genes have been associated with pachyonychia congenita. In addition, peptides from the C-terminal region of the protein have antimicrobial activity against bacterial pathogens. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3853
colony stimulating factor 3 receptor ENSG00000119535 CSF3R The protein encoded by this gene is the receptor for colony stimulating factor 3, a cytokine that controls the production, differentiation, and function of granulocytes. The encoded protein, which is a member of the family of cytokine receptors, may also function in some cell surface adhesion or recognition processes. Alternatively spliced transcript variants have been described. Mutations in this gene are a cause of Kostmann syndrome, also known as severe congenital neutropenia. 1441
troponin C1, slow skeletal and cardiac type ENSG00000114854 TNNC1 Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z. 7134
phosphoglucomutase 1 ENSG00000079739 PGM1 The protein encoded by this gene is an isozyme of phosphoglucomutase (PGM) and belongs to the phosphohexose mutase family. There are several PGM isozymes, which are encoded by different genes and catalyze the transfer of phosphate between the 1 and 6 positions of glucose. In most cell types, this PGM isozyme is predominant, representing about 90% of total PGM activity. In red cells, PGM2 is a major isozyme. This gene is highly polymorphic. Mutations in this gene cause glycogen storage disease type 14. Alternativley spliced transcript variants encoding different isoforms have been identified in this gene. 5236
prostaglandin D2 synthase ENSG00000107317 PTGDS The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. 5730
fatty acid synthase ENSG00000169710 FASN The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. 2194
complement component 1, s subcomponent ENSG00000182326 C1S This gene encodes a serine protease, which is a major constituent of the human complement subcomponent C1. C1s associates with two other complement components C1r and C1q in order to yield the first component of the serum complement system. Defects in this gene are the cause of selective C1s deficiency. 716
ankyrin repeat domain 1 ENSG00000148677 ANKRD1 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. 27063
integrin subunit alpha 8 ENSG00000077943 ITGA8 Integrins are heterodimeric transmembrane receptor proteins that mediate numerous cellular processes including cell adhesion, cytoskeletal rearrangement, and activation of cell signaling pathways. Integrins are composed of alpha and beta subunits. This gene encodes the alpha 8 subunit of the heterodimeric integrin alpha8beta1 protein. The encoded protein is a single-pass type 1 membrane protein that contains multiple FG-GAP repeats. This repeat is predicted to fold into a beta propeller structure. This gene regulates the recruitment of mesenchymal cells into epithelial structures, mediates cell-cell interactions, and regulates neurite outgrowth of sensory and motor neurons. The integrin alpha8beta1 protein thus plays an important role in wound-healing and organogenesis. Mutations in this gene have been associated with renal hypodysplasia/aplasia-1 (RHDA1) and with several animal models of chronic kidney disease. Alternate splicing results in multiple transcript variants encoding distinct isoforms. 8516
hydroxyacyl-CoA dehydrogenase/3-ketoacyl-CoA thiolase/enoyl-CoA hydratase (trifunctional protein), beta subunit ENSG00000138029 HADHB This gene encodes the beta subunit of the mitochondrial trifunctional protein, which catalyzes the last three steps of mitochondrial beta-oxidation of long chain fatty acids. The mitochondrial membrane-bound heterocomplex is composed of four alpha and four beta subunits, with the beta subunit catalyzing the 3-ketoacyl-CoA thiolase activity. The encoded protein can also bind RNA and decreases the stability of some mRNAs. The genes of the alpha and beta subunits of the mitochondrial trifunctional protein are located adjacent to each other in the human genome in a head-to-head orientation. Mutations in this gene result in trifunctional protein deficiency. Alternatively spliced transcript variants encoding different isoforms have been described. 3032
tumor protein p53 inducible nuclear protein 2 ENSG00000078804 TP53INP2 NA 58476
TIMP metallopeptidase inhibitor 2 ENSG00000035862 TIMP2 This gene is a member of the TIMP gene family. The proteins encoded by this gene family are natural inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. In addition to an inhibitory role against metalloproteinases, the encoded protein has a unique role among TIMP family members in its ability to directly suppress the proliferation of endothelial cells. As a result, the encoded protein may be critical to the maintenance of tissue homeostasis by suppressing the proliferation of quiescent tissues in response to angiogenic factors, and by inhibiting protease activity in tissues undergoing remodelling of the extracellular matrix. 7077
peroxiredoxin 6 ENSG00000117592 PRDX6 The protein encoded by this gene is a member of the thiol-specific antioxidant protein family. This protein is a bifunctional enzyme with two distinct active sites. It is involved in redox regulation of the cell; it can reduce H(2)O(2) and short chain organic, fatty acid, and phospholipid hydroperoxides. It may play a role in the regulation of phospholipid turnover as well as in protection against oxidative injury. 9588
NA ENSG00000229732 AC019349.5 NA ENSG00000229732
latent transforming growth factor beta binding protein 1 ENSG00000049323 LTBP1 The protein encoded by this gene belongs to the family of latent TGF-beta binding proteins (LTBPs). The secretion and activation of TGF-betas is regulated by their association with latency-associated proteins and with latent TGF-beta binding proteins. The product of this gene targets latent complexes of transforming growth factor beta to the extracellular matrix, where the latent cytokine is subsequently activated by several different mechanisms. Alternatively spliced transcript variants encoding different isoforms have been identified. 4052
hemoglobin subunit alpha 1 ENSG00000206172 HBA1 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. 3039
acetyl-CoA carboxylase beta ENSG00000076555 ACACB Acetyl-CoA carboxylase (ACC) is a complex multifunctional enzyme system. ACC is a biotin-containing enzyme which catalyzes the carboxylation of acetyl-CoA to malonyl-CoA, the rate-limiting step in fatty acid synthesis. ACC-beta is thought to control fatty acid oxidation by means of the ability of malonyl-CoA to inhibit carnitine-palmitoyl-CoA transferase I, the rate-limiting step in fatty acid uptake and oxidation by mitochondria. ACC-beta may be involved in the regulation of fatty acid oxidation, rather than fatty acid biosynthesis. There is evidence for the presence of two ACC-beta isoforms. 32
solute carrier family 2 member 3 ENSG00000059804 SLC2A3 NA 6515
myosin binding protein C, cardiac ENSG00000134571 MYBPC3 MYBPC3 encodes the cardiac isoform of myosin-binding protein C. Myosin-binding protein C is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. MYBPC3, the cardiac isoform, is expressed exclussively in heart muscle. Regulatory phosphorylation of the cardiac isoform in vivo by cAMP-dependent protein kinase (PKA) upon adrenergic stimulation may be linked to modulation of cardiac contraction. Mutations in MYBPC3 are one cause of familial hypertrophic cardiomyopathy. 4607
nebulin ENSG00000183091 NEB This gene encodes nebulin, a giant protein component of the cytoskeletal matrix that coexists with the thick and thin filaments within the sarcomeres of skeletal muscle. In most vertebrates, nebulin accounts for 3 to 4% of the total myofibrillar protein. The encoded protein contains approximately 30-amino acid long modules that can be classified into 7 types and other repeated modules. Protein isoform sizes vary from 600 to 800 kD due to alternative splicing that is tissue-, species-,and developmental stage-specific. Of the 183 exons in the nebulin gene, at least 43 are alternatively spliced, although exons 143 and 144 are not found in the same transcript. Of the several thousand transcript variants predicted for nebulin, the RefSeq Project has decided to create three representative RefSeq records. Mutations in this gene are associated with recessive nemaline myopathy. 4703
histidine rich calcium binding protein ENSG00000130528 HRC This gene encodes a luminal sarcoplasmic reticulum protein identified by its ability to bind low-density lipoprotein with high affinity. The protein interacts with the cytoplasmic domain of triadin, the main transmembrane protein of the junctional sarcoplasmic reticulum (SR) of skeletal muscle. The protein functions in the regulation of releasable calcium into the SR. 3270
caveolin 1 ENSG00000105974 CAV1 The scaffolding protein encoded by this gene is the main component of the caveolae plasma membranes found in most cell types. The protein links integrin subunits to the tyrosine kinase FYN, an initiating step in coupling integrins to the Ras-ERK pathway and promoting cell cycle progression. The gene is a tumor suppressor gene candidate and a negative regulator of the Ras-p42/44 mitogen-activated kinase cascade. Caveolin 1 and caveolin 2 are located next to each other on chromosome 7 and express colocalizing proteins that form a stable hetero-oligomeric complex. Mutations in this gene have been associated with Berardinelli-Seip congenital lipodystrophy. Alternatively spliced transcripts encode alpha and beta isoforms of caveolin 1. 857
nebulin related anchoring protein ENSG00000197893 NRAP NA 4892
collagen type VI alpha 3 chain ENSG00000163359 COL6A3 This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. 1293
F-box and leucine rich repeat protein 16 ENSG00000127585 FBXL16 Members of the F-box protein family, such as FBXL16, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). 146330
cornulin ENSG00000143536 CRNN This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation. 49860
heat shock protein family A (Hsp70) member 1B ENSG00000204388 HSPA1B This intronless gene encodes a 70kDa heat shock protein which is a member of the heat shock protein 70 family. In conjuction with other heat shock proteins, this protein stabilizes existing proteins against aggregation and mediates the folding of newly translated proteins in the cytosol and in organelles. It is also involved in the ubiquitin-proteasome pathway through interaction with the AU-rich element RNA-binding protein 1. The gene is located in the major histocompatibility complex class III region, in a cluster with two closely related genes which encode similar proteins. 3304
heparan sulfate proteoglycan 2 ENSG00000142798 HSPG2 This gene encodes the perlecan protein, which consists of a core protein to which three long chains of glycosaminoglycans (heparan sulfate or chondroitin sulfate) are attached. The perlecan protein is a large multidomain proteoglycan that binds to and cross-links many extracellular matrix components and cell-surface molecules. It has been shown that this protein interacts with laminin, prolargin, collagen type IV, FGFBP1, FBLN2, FGF7 and transthyretin, etc., and it plays essential roles in multiple biological activities. Perlecan is a key component of the vascular extracellular matrix, where it helps to maintain the endothelial barrier function. It is a potent inhibitor of smooth muscle cell proliferation and is thus thought to help maintain vascular homeostasis. It can also promote growth factor (e.g., FGF2) activity and thus stimulate endothelial growth and re-generation. It is a major component of basement membranes, where it is involved in the stabilization of other molecules as well as being involved with glomerular permeability to macromolecules and cell adhesion. Mutations in this gene cause Schwartz-Jampel syndrome type 1, Silverman-Handmaker type of dyssegmental dysplasia, and tardive dyskinesia. Alternative splicing of this gene results in multiple transcript variants. 3339
microtubule associated monooxygenase, calponin and LIM domain containing 1 ENSG00000135596 MICAL1 This gene encodes an enzyme that oxidizes methionine residues on actin, thereby promoting depolymerization of actin filaments. This protein interacts with and regulates signalling by NEDD9/CAS-L (neural precursor cell expressed, developmentally down-regulated 9). Alternative splicing results in multiple transcript variants. 64780
vimentin ENSG00000026025 VIM This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. 7431
myosin, heavy chain 2, skeletal muscle, adult ENSG00000125414 MYH2 Myosins are actin-based motor proteins that function in the generation of mechanical force in eukaryotic cells. Muscle myosins are heterohexamers composed of 2 myosin heavy chains and 2 pairs of nonidentical myosin light chains. This gene encodes a member of the class II or conventional myosin heavy chains, and functions in skeletal muscle contraction. This gene is found in a cluster of myosin heavy chain genes on chromosome 17. A mutation in this gene results in inclusion body myopathy-3. Multiple alternatively spliced variants, encoding the same protein, have been identified. 4620
nicotinamide N-methyltransferase ENSG00000166741 NNMT N-methylation is one method by which drug and other xenobiotic compounds are metabolized by the liver. This gene encodes the protein responsible for this enzymatic activity which uses S-adenosyl methionine as the methyl donor. 4837
osteoglycin ENSG00000106809 OGN This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family of proteins. The encoded protein induces ectopic bone formation in conjunction with transforming growth factor beta and may regulate osteoblast differentiation. High expression of the encoded protein may be associated with elevated heart left ventricular mass. Alternative splicing results in multiple transcript variants. 4969
retinol saturase ENSG00000042445 RETSAT NA 54884
semaphorin 3B ENSG00000012171 SEMA3B The protein encoded by this gene belongs to the class-3 semaphorin/collapsin family, whose members function in growth cone guidance during neuronal development. This family member inhibits axonal extension and has been shown to act as a tumor suppressor by inducing apoptosis. Alternative splicing of this gene results in multiple transcript variants. 7869
protein phosphatase 1 regulatory inhibitor subunit 14A ENSG00000167641 PPP1R14A The protein encoded by this gene belongs to the protein phosphatase 1 (PP1) inhibitor family. This protein is an inhibitor of smooth muscle myosin phosphatase, and has higher inhibitory activity when phosphorylated. Inhibition of myosin phosphatase leads to increased myosin phosphorylation and enhanced smooth muscle contraction. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. 94274
latent transforming growth factor beta binding protein 2 ENSG00000119681 LTBP2 The protein encoded by this gene belongs to the family of latent transforming growth factor (TGF)-beta binding proteins (LTBP), which are extracellular matrix proteins with multi-domain structure. This protein is the largest member of the LTBP family possessing unique regions and with most similarity to the fibrillins. It has thus been suggested that it may have multiple functions: as a member of the TGF-beta latent complex, as a structural component of microfibrils, and a role in cell adhesion. 4053
Ran GTPase activating protein 1 ENSG00000100401 RANGAP1 This gene encodes a protein that associates with the nuclear pore complex and participates in the regulation of nuclear transport. The encoded protein interacts with Ras-related nuclear protein 1 (RAN) and regulates guanosine triphosphate (GTP)-binding and exchange. Alternative splicing results in multiple transcript variants. 5905
cystatin B ENSG00000160213 CSTB The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and kininogens. This gene encodes a stefin that functions as an intracellular thiol protease inhibitor. The protein is able to form a dimer stabilized by noncovalent forces, inhibiting papain and cathepsins l, h and b. The protein is thought to play a role in protecting against the proteases leaking from lysosomes. Evidence indicates that mutations in this gene are responsible for the primary defects in patients with progressive myoclonic epilepsy (EPM1). 1476
DDB1 and CUL4 associated factor 6 ENSG00000143164 DCAF6 NA 55827
nicotinamide nucleotide transhydrogenase ENSG00000112992 NNT This gene encodes an integral protein of the inner mitochondrial membrane. The enzyme couples hydride transfer between NAD(H) and NADP(+) to proton translocation across the inner mitochondrial membrane. Under most physiological conditions, the enzyme uses energy from the mitochondrial proton gradient to produce high concentrations of NADPH. The resulting NADPH is used for biosynthesis and in free radical detoxification. Two alternatively spliced variants, encoding the same protein, have been found for this gene. 23530
integrin subunit alpha X ENSG00000140678 ITGAX This gene encodes the integrin alpha X chain protein. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain. This protein combines with the beta 2 chain (ITGB2) to form a leukocyte-specific integrin referred to as inactivated-C3b (iC3b) receptor 4 (CR4). The alpha X beta 2 complex seems to overlap the properties of the alpha M beta 2 integrin in the adherence of neutrophils and monocytes to stimulated endothelium cells, and in the phagocytosis of complement coated particles. Two transcript variants encoding different isoforms have been found for this gene. 3687
complement component 3 ENSG00000125730 C3 Complement component C3 plays a central role in the activation of complement system. Its activation is required for both classical and alternative complement activation pathways. The encoded preproprotein is proteolytically processed to generate alpha and beta subunits that form the mature protein, which is then further processed to generate numerous peptide products. The C3a peptide, also known as the C3a anaphylatoxin, modulates inflammation and possesses antimicrobial activity. Mutations in this gene are associated with atypical hemolytic uremic syndrome and age-related macular degeneration in human patients. 718
extracellular matrix protein 1 ENSG00000143369 ECM1 This gene encodes a soluble protein that is involved in endochondral bone formation, angiogenesis, and tumor biology. It also interacts with a variety of extracellular and structural proteins, contributing to the maintenance of skin integrity and homeostasis. Mutations in this gene are associated with lipoid proteinosis disorder (also known as hyalinosis cutis et mucosae or Urbach-Wiethe disease) that is characterized by generalized thickening of skin, mucosae and certain viscera. Alternatively spliced transcript variants encoding distinct isoforms have been described for this gene. 1893
uncharacterized LOC105372824 ENSG00000160209 LOC105372824 NA 105372824
pyridoxal (pyridoxine, vitamin B6) kinase ENSG00000160209 PDXK The protein encoded by this gene phosphorylates vitamin B6, a step required for the conversion of vitamin B6 to pyridoxal-5-phosphate, an important cofactor in intermediary metabolism. The encoded protein is cytoplasmic and probably acts as a homodimer. Alternatively spliced transcript variants have been described, but their biological validity has not been determined. 8566
malate dehydrogenase 1 ENSG00000014641 MDH1 This gene encodes an enzyme that catalyzes the NAD/NADH-dependent, reversible oxidation of malate to oxaloacetate in many metabolic pathways, including the citric acid cycle. Two main isozymes are known to exist in eukaryotic cells: one is found in the mitochondrial matrix and the other in the cytoplasm. This gene encodes the cytosolic isozyme, which plays a key role in the malate-aspartate shuttle that allows malate to pass through the mitochondrial membrane to be transformed into oxaloacetate for further cellular processes. Alternatively spliced transcript variants have been found for this gene. A recent study showed that a C-terminally extended isoform is produced by use of an alternative in-frame translation termination codon via a stop codon readthrough mechanism, and that this isoform is localized in the peroxisomes. Pseudogenes have been identified on chromosomes X and 6. 4190
myosin, heavy chain 9, non-muscle ENSG00000100345 MYH9 This gene encodes a conventional non-muscle myosin; this protein should not be confused with the unconventional myosin-9a or 9b (MYO9A or MYO9B). The encoded protein is a myosin IIA heavy chain that contains an IQ domain and a myosin head-like domain which is involved in several important functions, including cytokinesis, cell motility and maintenance of cell shape. Defects in this gene have been associated with non-syndromic sensorineural deafness autosomal dominant type 17, Epstein syndrome, Alport syndrome with macrothrombocytopenia, Sebastian syndrome, Fechtner syndrome and macrothrombocytopenia with progressive sensorineural deafness. 4627
zinc finger protein 106 ENSG00000103994 ZNF106 NA 64397
heat shock protein family A (Hsp70) member 6 ENSG00000173110 HSPA6 NA 3310
acyl-CoA synthetase long-chain family member 1 ENSG00000151726 ACSL1 The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. Several transcript variants encoding different isoforms have been found for this gene. 2180
phosphatidylinositol-3,4,5-trisphosphate dependent Rac exchange factor 1 ENSG00000124126 PREX1 The protein encoded by this gene acts as a guanine nucleotide exchange factor for the RHO family of small GTP-binding proteins (RACs). It has been shown to bind to and activate RAC1 by exchanging bound GDP for free GTP. The encoded protein, which is found mainly in the cytoplasm, is activated by phosphatidylinositol-3,4,5-trisphosphate and the beta-gamma subunits of heterotrimeric G proteins. 57580
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",13,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 14 Annotations

out <- mygene::queryMany(gene_list[14,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id summary query name notfound
NPPA 4878 The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. ENSG00000175206 natriuretic peptide A NA
MYH6 4624 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. ENSG00000197616 myosin, heavy chain 6, cardiac muscle, alpha NA
ACTC1 70 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). ENSG00000159251 actin, alpha, cardiac muscle 1 NA
NPPA-AS1 ENSG00000242349 NA ENSG00000242349 NPPA antisense RNA 1 NA
ANKRD1 27063 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. ENSG00000148677 ankyrin repeat domain 1 NA
FN1 2335 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. ENSG00000115414 fibronectin 1 NA
PAM 5066 This gene encodes a multifunctional protein. The encoded preproprotein is proteolytically processed to generate the mature enzyme. This enzyme includes two domains with distinct catalytic activities, a peptidylglycine alpha-hydroxylating monooxygenase (PHM) domain and a peptidyl-alpha-hydroxyglycine alpha-amidating lyase (PAL) domain. These catalytic domains work sequentially to catalyze the conversion of neuroendocrine peptides to active alpha-amidated products. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. ENSG00000145730 peptidylglycine alpha-amidating monooxygenase NA
MB 4151 This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. ENSG00000198125 myoglobin NA
COL1A1 1277 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. ENSG00000108821 collagen type I alpha 1 NA
MYH7 4625 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. ENSG00000092054 myosin, heavy chain 7, cardiac muscle, beta NA
DKK3 27122 This gene encodes a protein that is a member of the dickkopf family. The secreted protein contains two cysteine rich regions and is involved in embryonic development through its interactions with the Wnt signaling pathway. The expression of this gene is decreased in a variety of cancer cell lines and it may function as a tumor suppressor gene. Alternative splicing results in multiple transcript variants encoding the same protein. ENSG00000050165 dickkopf WNT signaling pathway inhibitor 3 NA
MYL7 58498 NA ENSG00000106631 myosin light chain 7 NA
SYNPO 11346 Synaptopodin is an actin-associated protein that may play a role in actin-based cell shape and motility. The name synaptopodin derives from the protein’s associations with postsynaptic densities and dendritic spines and with renal podocytes (Mundel et al., 1997 [PubMed 9314539]). ENSG00000171992 synaptopodin NA
ATP2A2 488 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol into the sarcoplasmic reticulum lumen, and is involved in regulation of the contraction/relaxation cycle. Mutations in this gene cause Darier-White disease, also known as keratosis follicularis, an autosomal dominant skin disorder characterized by loss of adhesion between epidermal cells and abnormal keratinization. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000174437 ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 2 NA
HSPB7 27129 NA ENSG00000173641 heat shock protein family B (small) member 7 NA
FTL 2512 This gene encodes the light subunit of the ferritin protein. Ferritin is the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in this light chain ferritin gene are associated with several neurodegenerative diseases and hyperferritinemia-cataract syndrome. This gene has multiple pseudogenes. ENSG00000087086 ferritin, light polypeptide NA
FABP3 2170 The intracellular fatty acid-binding proteins (FABPs) belongs to a multigene family. FABPs are divided into at least three distinct types, namely the hepatic-, intestinal- and cardiac-type. They form 14-15 kDa proteins and are thought to participate in the uptake, intracellular metabolism and/or transport of long-chain fatty acids. They may also be responsible in the modulation of cell growth and proliferation. Fatty acid-binding protein 3 gene contains four exons and its function is to arrest growth of mammary epithelial cells. This gene is a candidate tumor suppressor gene for human breast cancer. Alternative splicing results in multiple transcript variants. ENSG00000121769 fatty acid binding protein 3 NA
ACTN2 88 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. ENSG00000077522 actinin alpha 2 NA
COL6A3 1293 This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. ENSG00000163359 collagen type VI alpha 3 chain NA
DSTN 11034 The product of this gene belongs to the actin-binding proteins ADF family. This family of proteins is responsible for enhancing the turnover rate of actin in vivo. This gene encodes the actin depolymerizing protein that severs actin filaments (F-actin) and binds to actin monomers (G-actin). Two transcript variants encoding distinct isoforms have been identified for this gene. ENSG00000125868 destrin, actin depolymerizing factor NA
TCAP 8557 Sarcomere assembly is regulated by the muscle protein titin. Titin is a giant elastic protein with kinase activity that extends half the length of a sarcomere. It serves as a scaffold to which myofibrils and other muscle related proteins are attached. This gene encodes a protein found in striated and cardiac muscle that binds to the titin Z1-Z2 domains and is a substrate of titin kinase, interactions thought to be critical to sarcomere assembly. Mutations in this gene are associated with limb-girdle muscular dystrophy type 2G. ENSG00000173991 titin-cap NA
SERPINE1 5054 This gene encodes a member of the serine proteinase inhibitor (serpin) superfamily. This member is the principal inhibitor of tissue plasminogen activator (tPA) and urokinase (uPA), and hence is an inhibitor of fibrinolysis. Defects in this gene are the cause of plasminogen activator inhibitor-1 deficiency (PAI-1 deficiency), and high concentrations of the gene product are associated with thrombophilia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000106366 serpin family E member 1 NA
MYBPC3 4607 MYBPC3 encodes the cardiac isoform of myosin-binding protein C. Myosin-binding protein C is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. MYBPC3, the cardiac isoform, is expressed exclussively in heart muscle. Regulatory phosphorylation of the cardiac isoform in vivo by cAMP-dependent protein kinase (PKA) upon adrenergic stimulation may be linked to modulation of cardiac contraction. Mutations in MYBPC3 are one cause of familial hypertrophic cardiomyopathy. ENSG00000134571 myosin binding protein C, cardiac NA
TTN 7273 This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. ENSG00000155657 titin NA
COL6A2 1292 This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. ENSG00000142173 collagen type VI alpha 2 NA
MYL12A 10627 This gene encodes a nonsarcomeric myosin regulatory light chain. This protein is activated by phosphorylation and regulates smooth muscle and non-muscle cell contraction. This protein may also be involved in DNA damage repair by sequestering the transcriptional regulator apoptosis-antagonizing transcription factor (AATF)/Che-1 which functions as a repressor of p53-driven apoptosis. Alternate splicing results in multiple transcript variants. A pseudogene of this gene is found on chromosome 8. ENSG00000101608 myosin light chain 12A NA
PTX3 5806 NA ENSG00000163661 pentraxin 3 NA
ACTA2 59 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. ENSG00000107796 actin, alpha 2, smooth muscle, aorta NA
CRIP2 1397 This gene encodes a putative transcription factor with two LIM zinc-binding domains. The encoded protein may participate in the differentiation of smooth muscle tissue. Alternative splicing results in multiple transcript variants. ENSG00000182809 cysteine rich protein 2 NA
FSTL1 11167 This gene encodes a protein with similarity to follistatin, an activin-binding protein. It contains an FS module, a follistatin-like sequence containing 10 conserved cysteine residues. This gene product is thought to be an autoantigen associated with rheumatoid arthritis. ENSG00000163430 follistatin like 1 NA
TNNC1 7134 Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z. ENSG00000114854 troponin C1, slow skeletal and cardiac type NA
MYH10 4628 This gene encodes a member of the myosin superfamily. The protein represents a conventional non-muscle myosin; it should not be confused with the unconventional myosin-10 (MYO10). Myosins are actin-dependent motor proteins with diverse functions including regulation of cytokinesis, cell motility, and cell polarity. Mutations in this gene have been associated with May-Hegglin anomaly and developmental defects in brain and heart. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000133026 myosin, heavy chain 10, non-muscle NA
EFEMP1 2202 This gene encodes a member of the fibulin family of extracellular matrix glycoproteins. Like all members of this family, the encoded protein contains tandemly repeated epidermal growth factor-like repeats followed by a C-terminus fibulin-type domain. This gene is upregulated in malignant gliomas and may play a role in the aggressive nature of these tumors. Mutations in this gene are associated with Doyne honeycomb retinal dystrophy. Alternatively spliced transcript variants that encode the same protein have been described. ENSG00000115380 EGF containing fibulin like extracellular matrix protein 1 NA
THBS1 7057 The protein encoded by this gene is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein can bind to fibrinogen, fibronectin, laminin, type V collagen and integrins alpha-V/beta-1. This protein has been shown to play roles in platelet aggregation, angiogenesis, and tumorigenesis. ENSG00000137801 thrombospondin 1 NA
FBN1 2200 This gene encodes a member of the fibrillin family of proteins. The encoded preproprotein is proteolytically processed to generate two proteins including the extracellular matrix component fibrillin-1 and the protein hormone asprosin. Fibrillin-1 is an extracellular matrix glycoprotein that serves as a structural component of calcium-binding microfibrils. These microfibrils provide force-bearing structural support in elastic and nonelastic connective tissue throughout the body. Asprosin, secreted by white adipose tissue, has been shown to regulate glucose homeostasis. Mutations in this gene are associated with Marfan syndrome and the related MASS phenotype, as well as ectopia lentis syndrome, Weill-Marchesani syndrome, Shprintzen-Goldberg syndrome and neonatal progeroid syndrome. ENSG00000166147 fibrillin 1 NA
FBLN5 10516 The protein encoded by this gene is a secreted, extracellular matrix protein containing an Arg-Gly-Asp (RGD) motif and calcium-binding EGF-like domains. It promotes adhesion of endothelial cells through interaction of integrins and the RGD motif. It is prominently expressed in developing arteries but less so in adult vessels. However, its expression is reinduced in balloon-injured vessels and atherosclerotic lesions, notably in intimal vascular smooth muscle cells and endothelial cells. Therefore, the protein encoded by this gene may play a role in vascular development and remodeling. Defects in this gene are a cause of autosomal dominant cutis laxa, autosomal recessive cutis laxa type I (CL type I), and age-related macular degeneration type 3 (ARMD3). ENSG00000140092 fibulin 5 NA
MT2A 4502 NA ENSG00000125148 metallothionein 2A NA
MYL9 10398 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000101335 myosin light chain 9 NA
CCDC80 151887 NA ENSG00000091986 coiled-coil domain containing 80 NA
ALDOA 226 The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10. ENSG00000149925 aldolase, fructose-bisphosphate A NA
CTSB 1508 This gene encodes a member of the C1 family of peptidases. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate multiple protein products. These products include the cathepsin B light and heavy chains, which can dimerize to form the double chain form of the enzyme. This enzyme is a lysosomal cysteine protease with both endopeptidase and exopeptidase activity that may play a role in protein turnover. It is also known as amyloid precursor protein secretase and is involved in the proteolytic processing of amyloid precursor protein (APP). Incomplete proteolytic processing of APP has been suggested to be a causative factor in Alzheimer’s disease, the most common cause of dementia. Overexpression of the encoded protein has been associated with esophageal adenocarcinoma and other tumors. Multiple pseudogenes of this gene have been identified. ENSG00000164733 cathepsin B NA
NA NA NA ENSG00000259716 NA TRUE
COL27A1 85301 This gene encodes a member of the fibrillar collagen family, and plays a role during the calcification of cartilage and the transition of cartilage to bone. The encoded protein product is a preproprotein. It includes an N-terminal signal peptide, which is followed by an N-terminal propetide, mature peptide and a C-terminal propeptide. The N-terminal propeptide contains thrombospondin N-terminal-like and laminin G-like domains. The mature peptide is a major triple-helical region. The C-terminal propeptide, also known as COLFI domain, plays crucial roles in tissue growth and repair. Mutations in this gene cause Steel syndrome. Alternatively spliced transcript variants have been found, but the full-length nature of some variants has not been determined. ENSG00000196739 collagen type XXVII alpha 1 NA
FNBP1 23048 The protein encoded by this gene is a member of the formin-binding-protein family. The protein contains an N-terminal Fer/Cdc42-interacting protein 4 (CIP4) homology (FCH) domain followed by a coiled-coil domain, a proline-rich motif, a second coiled-coil domain, a Rho family protein-binding domain (RBD), and a C-terminal SH3 domain. This protein binds sorting nexin 2 (SNX2), tankyrase (TNKS), and dynamin; an interaction between this protein and formin has not been demonstrated yet in human. ENSG00000187239 formin binding protein 1 NA
TPM1 7168 This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. ENSG00000140416 tropomyosin 1 (alpha) NA
TGFBI 7045 This gene encodes an RGD-containing protein that binds to type I, II and IV collagens. The RGD motif is found in many extracellular matrix proteins modulating cell adhesion and serves as a ligand recognition sequence for several integrins. This protein plays a role in cell-collagen interactions and may be involved in endochondrial bone formation in cartilage. The protein is induced by transforming growth factor-beta and acts to inhibit cell adhesion. Mutations in this gene are associated with multiple types of corneal dystrophy. ENSG00000120708 transforming growth factor beta induced NA
MYOM2 9172 The giant protein titin, together with its associated proteins, interconnects the major structure of sarcomeres, the M bands and Z discs. The C-terminal end of the titin string extends into the M line, where it binds tightly to M-band constituents of apparent molecular masses of 190 kD and 165 kD. The predicted MYOM2 protein contains 1,465 amino acids. Like MYOM1, MYOM2 has a unique N-terminal domain followed by 12 repeat domains with strong homology to either fibronectin type III or immunoglobulin C2 domains. Protein sequence comparisons suggested that the MYOM2 protein and bovine M protein are identical. ENSG00000036448 myomesin 2 NA
GLUL 2752 The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. ENSG00000135821 glutamate-ammonia ligase NA
FABP4 2167 FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. ENSG00000170323 fatty acid binding protein 4 NA
FLNB 2317 This gene encodes a member of the filamin family. The encoded protein interacts with glycoprotein Ib alpha as part of the process to repair vascular injuries. The platelet glycoprotein Ib complex includes glycoprotein Ib alpha, and it binds the actin cytoskeleton. Mutations in this gene have been found in several conditions: atelosteogenesis type 1 and type 3; boomerang dysplasia; autosomal dominant Larsen syndrome; and spondylocarpotarsal synostosis syndrome. Multiple alternatively spliced transcript variants that encode different protein isoforms have been described for this gene. ENSG00000136068 filamin B NA
NPPB 4879 This gene is a member of the natriuretic peptide family and encodes a secreted protein which functions as a cardiac hormone. The protein undergoes two cleavage events, one within the cell and a second after secretion into the blood. The protein’s biological actions include natriuresis, diuresis, vasorelaxation, inhibition of renin and aldosterone secretion, and a key role in cardiovascular homeostasis. A high concentration of this protein in the bloodstream is indicative of heart failure. The protein also acts as an antimicrobial peptide with antibacterial and antifungal activity. Mutations in this gene have been associated with postmenopausal osteoporosis. ENSG00000120937 natriuretic peptide B NA
MTND2P28 ENSG00000225630 NA ENSG00000225630 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28 NA
ACTA1 58 The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. ENSG00000143632 actin, alpha 1, skeletal muscle NA
ACTG2 72 Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. ENSG00000163017 actin, gamma 2, smooth muscle, enteric NA
GOLGA8A 23015 The Golgi apparatus, which participates in glycosylation and transport of proteins and lipids in the secretory pathway, consists of a series of stacked, flattened membrane sacs referred to as cisternae. Interactions between the Golgi and microtubules are thought to be important for the reorganization of the Golgi after it fragments during mitosis. The golgins constitute a family of proteins which are localized to the Golgi. This gene encodes a golgin which structurally resembles its family member GOLGA2, suggesting that they may share a similar function. There are many similar copies of this gene on chromosome 15. Alternative splicing results in multiple transcript variants. ENSG00000175265 golgin A8 family member A NA
DCN 1634 This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. ENSG00000011465 decorin NA
TGM2 7052 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene acts as a monomer, is induced by retinoic acid, and appears to be involved in apoptosis. Finally, the encoded protein is the autoantigen implicated in celiac disease. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000198959 transglutaminase 2 NA
NNMT 4837 N-methylation is one method by which drug and other xenobiotic compounds are metabolized by the liver. This gene encodes the protein responsible for this enzymatic activity which uses S-adenosyl methionine as the methyl donor. ENSG00000166741 nicotinamide N-methyltransferase NA
CASQ2 845 The protein encoded by this gene specifies the cardiac muscle family member of the calsequestrin family. Calsequestrin is localized to the sarcoplasmic reticulum in cardiac and slow skeletal muscle cells. The protein is a calcium binding protein that stores calcium for muscle function. Mutations in this gene cause stress-induced polymorphic ventricular tachycardia, also referred to as catecholaminergic polymorphic ventricular tachycardia 2 (CPVT2), a disease characterized by bidirectional ventricular tachycardia that may lead to cardiac arrest. ENSG00000118729 calsequestrin 2 NA
MGP 4256 The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. ENSG00000111341 matrix Gla protein NA
AEBP1 165 This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. ENSG00000106624 AE binding protein 1 NA
MFGE8 4240 This gene encodes a preproprotein that is proteolytically processed to form multiple protein products. The major encoded protein product, lactadherin, is a membrane glycoprotein that promotes phagocytosis of apoptotic cells. This protein has also been implicated in wound healing, autoimmune disease, and cancer. Lactadherin can be further processed to form a smaller cleavage product, medin, which comprises the major protein component of aortic medial amyloid (AMA). Alternative splicing results in multiple transcript variants. ENSG00000140545 milk fat globule-EGF factor 8 protein NA
KRT4 3851 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000170477 keratin 4 NA
LTBP1 4052 The protein encoded by this gene belongs to the family of latent TGF-beta binding proteins (LTBPs). The secretion and activation of TGF-betas is regulated by their association with latency-associated proteins and with latent TGF-beta binding proteins. The product of this gene targets latent complexes of transforming growth factor beta to the extracellular matrix, where the latent cytokine is subsequently activated by several different mechanisms. Alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000049323 latent transforming growth factor beta binding protein 1 NA
NDUFA4 4697 The protein encoded by this gene belongs to the complex I 9kDa subunit family. Mammalian complex I of mitochondrial respiratory chain is composed of 45 different subunits. This protein has NADH dehydrogenase activity and oxidoreductase activity. It transfers electrons from NADH to the respiratory chain. The immediate electron acceptor for the enzyme is believed to be ubiquinone. ENSG00000189043 NDUFA4, mitochondrial complex associated NA
RGS5 8490 This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. ENSG00000143248 regulator of G-protein signaling 5 NA
MYL3 4634 MYL3 encodes myosin light chain 3, an alkali light chain also referred to in the literature as both the ventricular isoform and the slow skeletal muscle isoform. Mutations in MYL3 have been identified as a cause of mid-left ventricular chamber type hypertrophic cardiomyopathy. ENSG00000160808 myosin light chain 3 NA
PDE4DIP 9659 The protein encoded by this gene serves to anchor phosphodiesterase 4D to the Golgi/centrosome region of the cell. Defects in this gene may be a cause of myeloproliferative disorder (MBD) associated with eosinophilia. Several transcript variants encoding different isoforms have been found for this gene. ENSG00000178104 phosphodiesterase 4D interacting protein NA
LRP1 4035 This gene encodes a member of the low-density lipoprotein receptor family of proteins. The encoded preproprotein is proteolytically processed by furin to generate 515 kDa and 85 kDa subunits that form the mature receptor (PMID: 8546712). This receptor is involved in several cellular processes, including intracellular signaling, lipid homeostasis, and clearance of apoptotic cells. In addition, the encoded protein is necessary for the alpha 2-macroglobulin-mediated clearance of secreted amyloid precursor protein and beta-amyloid, the main component of amyloid plaques found in Alzheimer patients. Expression of this gene decreases with age and has been found to be lower than controls in brain tissue from Alzheimer’s disease patients. ENSG00000123384 LDL receptor related protein 1 NA
CALD1 800 This gene encodes a calmodulin- and actin-binding protein that plays an essential role in the regulation of smooth muscle and nonmuscle contraction. The conserved domain of this protein possesses the binding activities to Ca(2+)-calmodulin, actin, tropomyosin, myosin, and phospholipids. This protein is a potent inhibitor of the actin-tropomyosin activated myosin MgATPase, and serves as a mediating factor for Ca(2+)-dependent inhibition of smooth muscle contraction. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. ENSG00000122786 caldesmon 1 NA
LDB3 11155 This gene encodes a PDZ domain-containing protein. PDZ motifs are modular protein-protein interaction domains consisting of 80-120 amino acid residues. PDZ domain-containing proteins interact with each other in cytoskeletal assembly or with other proteins involved in targeting and clustering of membrane proteins. The protein encoded by this gene interacts with alpha-actinin-2 through its N-terminal PDZ domain and with protein kinase C via its C-terminal LIM domains. The LIM domain is a cysteine-rich motif defined by 50-60 amino acids containing two zinc-binding modules. This protein also interacts with all three members of the myozenin family. Mutations in this gene have been associated with myofibrillar myopathy and dilated cardiomyopathy. Alternatively spliced transcript variants encoding different isoforms have been identified; all isoforms have N-terminal PDZ domains while only longer isoforms (1, 2 and 5) have C-terminal LIM domains. ENSG00000122367 LIM domain binding 3 NA
PPP1R3C 5507 This gene encodes a regulatory subunit of protein phosphatase-1 (PP1). PP1 catalyzes reversible protein phosphorylation, which is important in a wide range of cellular activities: neuronal, muscular, RNA splicing, protein synthesis, cell death, and glycogen metabolism, to name just a few. By interacting with different regulatory subunits, PP1 is directed to different parts of the cell, to different substrates, or to respond to extracellular signals. ENSG00000119938 protein phosphatase 1 regulatory subunit 3C NA
TNNT2 7139 The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. ENSG00000118194 troponin T2, cardiac type NA
MYL4 4635 Myosin is a hexameric ATPase cellular motor protein. It is composed of two myosin heavy chains, two nonphosphorylatable myosin alkali light chains, and two phosphorylatable myosin regulatory light chains. This gene encodes a myosin alkali light chain that is found in embryonic muscle and adult atria. Two alternatively spliced transcript variants encoding the same protein have been found for this gene. ENSG00000198336 myosin light chain 4 NA
VCAN 1462 This gene is a member of the aggrecan/versican proteoglycan family. The protein encoded is a large chondroitin sulfate proteoglycan and is a major component of the extracellular matrix. This protein is involved in cell adhesion, proliferation, proliferation, migration and angiogenesis and plays a central role in tissue morphogenesis and maintenance. Mutations in this gene are the cause of Wagner syndrome type 1. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000038427 versican NA
MDH1 4190 This gene encodes an enzyme that catalyzes the NAD/NADH-dependent, reversible oxidation of malate to oxaloacetate in many metabolic pathways, including the citric acid cycle. Two main isozymes are known to exist in eukaryotic cells: one is found in the mitochondrial matrix and the other in the cytoplasm. This gene encodes the cytosolic isozyme, which plays a key role in the malate-aspartate shuttle that allows malate to pass through the mitochondrial membrane to be transformed into oxaloacetate for further cellular processes. Alternatively spliced transcript variants have been found for this gene. A recent study showed that a C-terminally extended isoform is produced by use of an alternative in-frame translation termination codon via a stop codon readthrough mechanism, and that this isoform is localized in the peroxisomes. Pseudogenes have been identified on chromosomes X and 6. ENSG00000014641 malate dehydrogenase 1 NA
ACTA2-AS1 ENSG00000180139 NA ENSG00000180139 ACTA2 antisense RNA 1 NA
HSPA2 3306 NA ENSG00000126803 heat shock protein family A (Hsp70) member 2 NA
IGFBP3 3486 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein forms a ternary complex with insulin-like growth factor acid-labile subunit (IGFALS) and either insulin-like growth factor (IGF) I or II. In this form, it circulates in the plasma, prolonging the half-life of IGFs and altering their interaction with cell surface receptors. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. ENSG00000146674 insulin like growth factor binding protein 3 NA
DKK1 22943 This gene encodes a protein that is a member of the dickkopf family. It is a secreted protein with two cysteine rich regions and is involved in embryonic development through its inhibition of the WNT signaling pathway. Elevated levels of DKK1 in bone marrow plasma and peripheral blood is associated with the presence of osteolytic bone lesions in patients with multiple myeloma. ENSG00000107984 dickkopf WNT signaling pathway inhibitor 1 NA
RPL3 6122 Ribosomes, the complexes that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L3P family of ribosomal proteins and it is located in the cytoplasm. The protein can bind to the HIV-1 TAR mRNA, and it has been suggested that the protein contributes to tat-mediated transactivation. This gene is co-transcribed with several small nucleolar RNA genes, which are located in several of this gene’s introns. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. ENSG00000100316 ribosomal protein L3 NA
MRC2 9902 This gene encodes a member of the mannose receptor family of proteins that contain a fibronectin type II domain and multiple C-type lectin-like domains. The encoded protein plays a role in extracellular matrix remodeling by mediating the internalization and lysosomal degradation of collagen ligands. Expression of this gene may play a role in the tumorigenesis and metastasis of several malignancies including breast cancer, gliomas and metastatic bone disease. ENSG00000011028 mannose receptor C type 2 NA
FBLN1 2192 Fibulin 1 is a secreted glycoprotein that becomes incorporated into a fibrillar extracellular matrix. Calcium-binding is apparently required to mediate its binding to laminin and nidogen. It mediates platelet adhesion via binding fibrinogen. Four splice variants which differ in the 3’ end have been identified. Each variant encodes a different isoform, but no functional distinctions have been identified among the four variants. ENSG00000077942 fibulin 1 NA
LDLRAP1 26119 The protein encoded by this gene is a cytosolic protein which contains a phosphotyrosine binding (PTD) domain. The PTD domain has been found to interact with the cytoplasmic tail of the LDL receptor. Mutations in this gene lead to LDL receptor malfunction and cause the disorder autosomal recessive hypercholesterolaemia. ENSG00000157978 low density lipoprotein receptor adaptor protein 1 NA
RPS6 6194 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a cytoplasmic ribosomal protein that is a component of the 40S subunit. The protein belongs to the S6E family of ribosomal proteins. It is the major substrate of protein kinases in the ribosome, with subsets of five C-terminal serine residues phosphorylated by different protein kinases. Phosphorylation is induced by a wide range of stimuli, including growth factors, tumor-promoting agents, and mitogens. Dephosphorylation occurs at growth arrest. The protein may contribute to the control of cell growth and proliferation through the selective translation of particular classes of mRNA. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. ENSG00000137154 ribosomal protein S6 NA
GPD1 2819 This gene encodes a member of the NAD-dependent glycerol-3-phosphate dehydrogenase family. The encoded protein plays a critical role in carbohydrate and lipid metabolism by catalyzing the reversible conversion of dihydroxyacetone phosphate (DHAP) and reduced nicotine adenine dinucleotide (NADH) to glycerol-3-phosphate (G3P) and NAD+. The encoded cytosolic protein and mitochondrial glycerol-3-phosphate dehydrogenase also form a glycerol phosphate shuttle that facilitates the transfer of reducing equivalents from the cytosol to mitochondria. Mutations in this gene are a cause of transient infantile hypertriglyceridemia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000167588 glycerol-3-phosphate dehydrogenase 1 NA
NOV 4856 The protein encoded by this gene is a small secreted cysteine-rich protein and a member of the CCN family of regulatory proteins. CNN family proteins associate with the extracellular matrix and play an important role in cardiovascular and skeletal development, fibrosis and cancer development. ENSG00000136999 nephroblastoma overexpressed NA
PTGDS 5730 The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. ENSG00000107317 prostaglandin D2 synthase NA
COL5A2 1290 This gene encodes an alpha chain for one of the low abundance fibrillar collagens. Fibrillar collagen molecules are trimers that can be composed of one or more types of alpha chains. Type V collagen is found in tissues containing type I collagen and appears to regulate the assembly of heterotypic fibers composed of both type I and type V collagen. This gene product is closely related to type XI collagen and it is possible that the collagen chains of types V and XI constitute a single collagen type with tissue-specific chain combinations. Mutations in this gene are associated with Ehlers-Danlos syndrome, types I and II. ENSG00000204262 collagen type V alpha 2 chain NA
CTGF 1490 The protein encoded by this gene is a mitogen that is secreted by vascular endothelial cells. The encoded protein plays a role in chondrocyte proliferation and differentiation, cell adhesion in many cell types, and is related to platelet-derived growth factor. Certain polymorphisms in this gene have been linked with a higher incidence of systemic sclerosis. ENSG00000118523 connective tissue growth factor NA
NA NA NA ENSG00000163486 NA TRUE
GPNMB 10457 The protein encoded by this gene is a type I transmembrane glycoprotein which shows homology to the pMEL17 precursor, a melanocyte-specific protein. GPNMB shows expression in the lowly metastatic human melanoma cell lines and xenografts but does not show expression in the highly metastatic cell lines. GPNMB may be involved in growth delay and reduction of metastatic potential. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000136235 glycoprotein nmb NA
TNC 3371 This gene encodes an extracellular matrix protein with a spatially and temporally restricted tissue distribution. This protein is homohexameric with disulfide-linked subunits, and contains multiple EGF-like and fibronectin type-III domains. It is implicated in guidance of migrating neurons as well as axons during development, synaptic plasticity, and neuronal regeneration. ENSG00000041982 tenascin C NA
PNPLA2 57104 This gene encodes an enzyme which catalyzes the first step in the hydrolysis of triglycerides in adipose tissue. Mutations in this gene are associated with neutral lipid storage disease with myopathy. ENSG00000177666 patatin like phospholipase domain containing 2 NA
PLIN1 5346 The protein encoded by this gene coats lipid storage droplets in adipocytes, thereby protecting them until they can be broken down by hormone-sensitive lipase. The encoded protein is the major cAMP-dependent protein kinase substrate in adipocytes and, when unphosphorylated, may play a role in the inhibition of lipolysis. Alternatively spliced transcript variants varying in the 5’ UTR, but encoding the same protein, have been found for this gene. ENSG00000166819 perilipin 1 NA
SLC25A3 5250 The protein encoded by this gene catalyzes the transport of phosphate into the mitochondrial matrix, either by proton cotransport or in exchange for hydroxyl ions. The protein contains three related segments arranged in tandem which are related to those found in other characterized members of the mitochondrial carrier family. Both the N-terminal and C-terminal regions of this protein protrude toward the cytosol. Multiple alternatively spliced transcript variants have been isolated. ENSG00000075415 solute carrier family 25 member 3 NA
CYB5R3 1727 This gene encodes cytochrome b5 reductase, which includes a membrane-bound form in somatic cells (anchored in the endoplasmic reticulum, mitochondrial and other membranes) and a soluble form in erythrocytes. The membrane-bound form exists mainly on the cytoplasmic side of the endoplasmic reticulum and functions in desaturation and elongation of fatty acids, in cholesterol biosynthesis, and in drug metabolism. The erythrocyte form is located in a soluble fraction of circulating erythrocytes and is involved in methemoglobin reduction. The membrane-bound form has both membrane-binding and catalytic domains, while the soluble form has only the catalytic domain. Alternate splicing results in multiple transcript variants. Mutations in this gene cause methemoglobinemias. ENSG00000100243 cytochrome b5 reductase 3 NA
CDH2 1000 This gene encodes a classical cadherin and member of the cadherin superfamily. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein is proteolytically processed to generate a calcium-dependent cell adhesion molecule and glycoprotein. This protein plays a role in the establishment of left-right asymmetry, development of the nervous system and the formation of cartilage and bone. ENSG00000170558 cadherin 2 NA
PALLD 23022 This gene encodes a cytoskeletal protein that is required for organizing the actin cytoskeleton. The protein is a component of actin-containing microfilaments, and it is involved in the control of cell shape, adhesion, and contraction. Polymorphisms in this gene are associated with a susceptibility to pancreatic cancer type 1, and also with a risk for myocardial infarction. Alternative splicing results in multiple transcript variants. ENSG00000129116 palladin, cytoskeletal associated protein NA
TIAM1 7074 NA ENSG00000156299 T-cell lymphoma invasion and metastasis 1 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",14,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 15 Annotations

out <- mygene::queryMany(gene_list[15,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id name query summary notfound
PRSS1 5644 protease, serine 1 ENSG00000204983 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. NA
CPA1 1357 carboxypeptidase A1 ENSG00000091704 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. NA
PNLIP 5406 pancreatic lipase ENSG00000175535 This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. NA
CELA3A 10136 chymotrypsin like elastase family member 3A ENSG00000142789 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. NA
GP2 2813 glycoprotein 2 ENSG00000169347 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. NA
MBP 4155 myelin basic protein ENSG00000197971 The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. NA
CPB1 1360 carboxypeptidase B1 ENSG00000153002 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. NA
CLPS 1208 colipase ENSG00000137392 The protein encoded by this gene is a cofactor needed by pancreatic lipase for efficient dietary lipid hydrolysis. It binds to the C-terminal, non-catalytic domain of lipase, thereby stabilizing an active conformation and considerably increasing the overall hydrophobic binding site. The gene product allows lipase to anchor noncovalently to the surface of lipid micelles, counteracting the destabilizing influence of intestinal bile salts. This cofactor is only expressed in pancreatic acinar cells, suggesting regulation of expression by tissue-specific elements. Three transcript variants encoding different isoforms have been found for this gene. NA
CELA3B 23436 chymotrypsin like elastase family member 3B ENSG00000219073 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. NA
CTRB2 440387 chymotrypsinogen B2 ENSG00000168928 NA NA
CTRB1 1504 chymotrypsinogen B1 ENSG00000168925 The protein encoded by this gene is one of a family of serine proteases that is secreted into the gastrointestinal tract as an inactive precursor, which is activated by proteolytic cleavage with trypsin. NA
CEL 1056 carboxyl ester lipase ENSG00000170835 The protein encoded by this gene is a glycoprotein secreted from the pancreas into the digestive tract and from the lactating mammary gland into human milk. The physiological role of this protein is in cholesterol and lipid-soluble vitamin ester hydrolysis and absorption. This encoded protein promotes large chylomicron production in the intestine. Also its presence in plasma suggests its interactions with cholesterol and oxidized lipoproteins to modulate the progression of atherosclerosis. In pancreatic tumoral cells, this encoded protein is thought to be sequestrated within the Golgi compartment and is probably not secreted. This gene contains a variable number of tandem repeat (VNTR) polymorphism in the coding region that may influence the function of the encoded protein. NA
AMY2B 280 amylase, alpha 2B (pancreatic) ENSG00000240038 Amylases are secreted proteins that hydrolyze 1,4-alpha-glucoside bonds in oligosaccharides and polysaccharides, and thus catalyze the first step in digestion of dietary starch and glycogen. The human genome has a cluster of several amylase genes that are expressed at high levels in either salivary gland or pancreas. This gene encodes an amylase isoenzyme produced by the pancreas. NA
HBB 3043 hemoglobin subunit beta ENSG00000244734 The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. NA
REG1A 5967 regenerating family member 1 alpha ENSG00000115386 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. NA
CPA2 1358 carboxypeptidase A2 ENSG00000158516 Three different forms of human pancreatic procarboxypeptidase A have been isolated. The encoded protein represents the A2 form, which is a monomeric protein with different biochemical properties from the A1 and A3 forms. The A2 form of pancreatic procarboxypeptidase acts on aromatic C-terminal residues and is a secreted protein. NA
KRT13 3860 keratin 13 ENSG00000171401 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. NA
CELA2A 63036 chymotrypsin like elastase family member 2A ENSG00000142615 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Like most of the human elastases, elastase 2A is secreted from the pancreas as a zymogen. In other species, elastase 2A has been shown to preferentially cleave proteins after leucine, methionine, and phenylalanine residues. NA
AMY2A 279 amylase, alpha 2A (pancreatic) ENSG00000243480 This gene encodes a member of the alpha-amylase family of proteins. Amylases are secreted proteins that hydrolyze 1,4-alpha-glucoside bonds in oligosaccharides and polysaccharides, catalyzing the first step in digestion of dietary starch and glycogen. This gene and several family members are present in a gene cluster on chromosome 1. This gene encodes an amylase isoenzyme produced by the pancreas. NA
CTRC 11330 chymotrypsin C ENSG00000162438 This gene encodes a member of the peptidase S1 family. The encoded protein is a serum calcium-decreasing factor that has chymotrypsin-like protease activity. Alternatively spliced transcript variants have been observed, but their full-length nature has not been determined. NA
PLA2G1B 5319 phospholipase A2 group IB ENSG00000170890 This gene encodes a secreted member of the phospholipase A2 (PLA2) class of enzymes, which is produced by the pancreatic acinar cells. The encoded calcium-dependent enzyme catalyzes the hydrolysis of the sn-2 position of membrane glycerophospholipids to release arachidonic acid (AA) and lysophospholipids. AA is subsequently converted by downstream metabolic enzymes to several bioactive lipophilic compounds (eicosanoids), including prostaglandins (PGs) and leukotrienes (LTs). The enzyme may be involved in several physiological processes including cell contraction, cell proliferation and pathological response. NA
PNLIPRP1 5407 pancreatic lipase related protein 1 ENSG00000187021 NA NA
AHNAK 79026 AHNAK nucleoprotein ENSG00000124942 NA NA
IGFBP5 3488 insulin like growth factor binding protein 5 ENSG00000115461 NA NA
RP11-862L9.3 ENSG00000266844 NA ENSG00000266844 NA NA
FN1 2335 fibronectin 1 ENSG00000115414 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. NA
KRT4 3851 keratin 4 ENSG00000170477 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. NA
NA NA NA ENSG00000250606 NA TRUE
HBA2 3040 hemoglobin subunit alpha 2 ENSG00000188536 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. NA
TPM2 7169 tropomyosin 2 (beta) ENSG00000198467 This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
NA NA NA ENSG00000165862 NA TRUE
REG1B 5968 regenerating family member 1 beta ENSG00000172023 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. NA
RP11-331F4.4 ENSG00000240338 NA ENSG00000240338 NA NA
GSN 2934 gelsolin ENSG00000148180 The protein encoded by this gene binds to the ‘plus’ ends of actin monomers and filaments to prevent monomer exchange. The encoded calcium-regulated protein functions in both assembly and disassembly of actin filaments. Defects in this gene are a cause of familial amyloidosis Finnish type (FAF). Multiple transcript variants encoding several different isoforms have been found for this gene. NA
SYTL1 84958 synaptotagmin like 1 ENSG00000142765 NA NA
SEL1L 6400 SEL1L ERAD E3 ligase adaptor subunit ENSG00000071537 The protein encoded by this gene is part of a protein complex required for the retrotranslocation or dislocation of misfolded proteins from the endoplasmic reticulum lumen to the cytosol, where they are degraded by the proteasome in a ubiquitin-dependent manner. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
SPRR3 6707 small proline rich protein 3 ENSG00000163209 NA NA
PTGDS 5730 prostaglandin D2 synthase ENSG00000107317 The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. NA
HSP90AA1 3320 heat shock protein 90kDa alpha family class A member 1 ENSG00000080824 The protein encoded by this gene is an inducible molecular chaperone that functions as a homodimer. The encoded protein aids in the proper folding of specific target proteins by use of an ATPase activity that is modulated by co-chaperones. Two transcript variants encoding different isoforms have been found for this gene. NA
COL6A3 1293 collagen type VI alpha 3 chain ENSG00000163359 This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. NA
MTURN 222166 maturin, neural progenitor differentiation regulator homolog (Xenopus) ENSG00000180354 NA NA
NSMF 26012 NMDA receptor synaptonuclear signaling and neuronal migration factor ENSG00000165802 The protein encoded by this gene is involved in guidance of olfactory axon projections and migration of luteinizing hormone-releasing hormone neurons. Defects in this gene are a cause of idiopathic hypogonadotropic hypogonadism (IHH). Several transcript variants encoding different isoforms have been found for this gene. NA
CLU 1191 clusterin ENSG00000120885 The protein encoded by this gene is a secreted chaperone that can under some stress conditions also be found in the cell cytosol. It has been suggested to be involved in several basic biological events such as cell death, tumor progression, and neurodegenerative disorders. Alternate splicing results in both coding and non-coding variants. NA
MT2A 4502 metallothionein 2A ENSG00000125148 NA NA
SYCN 342898 syncollin ENSG00000179751 NA NA
AC019349.5 ENSG00000229732 NA ENSG00000229732 NA NA
UBB 7314 ubiquitin B ENSG00000170315 This gene encodes ubiquitin, one of the most conserved proteins known. Ubiquitin has a major role in targeting cellular proteins for degradation by the 26S proteosome. It is also involved in the maintenance of chromatin structure, the regulation of gene expression, and the stress response. Ubiquitin is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin moiety fused to an unrelated protein. This gene consists of three direct repeats of the ubiquitin coding sequence with no spacer sequence. Consequently, the protein is expressed as a polyubiquitin precursor with a final amino acid after the last repeat. An aberrant form of this protein has been detected in patients with Alzheimer’s disease and Down syndrome. Pseudogenes of this gene are located on chromosomes 1, 2, 13, and 17. Alternative splicing results in multiple transcript variants. NA
CCDC136 64753 coiled-coil domain containing 136 ENSG00000128596 NA NA
PLVAP 83483 plasmalemma vesicle associated protein ENSG00000130300 NA NA
TF 7018 transferrin ENSG00000091513 This gene encodes a glycoprotein with an approximate molecular weight of 76.5 kDa. It is thought to have been created as a result of an ancient gene duplication event that led to generation of homologous C and N-terminal domains each of which binds one ion of ferric iron. The function of this protein is to transport iron from the intestine, reticuloendothelial system, and liver parenchymal cells to all proliferating cells in the body. This protein may also have a physiologic role as granulocyte/pollen-binding protein (GPBP) involved in the removal of certain organic matter and allergens from serum. NA
PLCH2 9651 phospholipase C eta 2 ENSG00000149527 PLCH2 is a member of the PLC-eta family of the phosphoinositide-specific phospholipase C (PLC) superfamily of enzymes that cleave PtdIns(4,5) P2 to generate second messengers inositol 1,4,5-trisphosphate and diacylglycerol (Zhou et al., 2005 [PubMed 16107206]). NA
HBA1 3039 hemoglobin subunit alpha 1 ENSG00000206172 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. NA
ITM2C 81618 integral membrane protein 2C ENSG00000135916 NA NA
RPS3 6188 ribosomal protein S3 ENSG00000149273 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit, where it forms part of the domain where translation is initiated. The protein belongs to the S3P family of ribosomal proteins. Studies of the mouse and rat proteins have demonstrated that the protein has an extraribosomal role as an endonuclease involved in the repair of UV-induced DNA damage. The protein appears to be located in both the cytoplasm and nucleus but not in the nucleolus. Higher levels of expression of this gene in colon adenocarcinomas and adenomatous polyps compared to adjacent normal colonic mucosa have been observed. This gene is co-transcribed with the small nucleolar RNA genes U15A and U15B, which are located in its first and fifth introns, respectively. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
CALM2 805 calmodulin 2 (phosphorylase kinase, delta) ENSG00000143933 This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
CRNN 49860 cornulin ENSG00000143536 This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation. NA
MAST3 23031 microtubule associated serine/threonine kinase 3 ENSG00000099308 NA NA
HLA-B 3106 major histocompatibility complex, class I, B ENSG00000234745 HLA-B belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon 1 encodes the leader peptide, exon 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Hundreds of HLA-B alleles have been described. NA
KRT6A 3853 keratin 6A ENSG00000205420 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. As many as six of this type II cytokeratin (KRT6) have been identified; the multiplicity of the genes is attributed to successive gene duplication events. The genes are expressed with family members KRT16 and/or KRT17 in the filiform papillae of the tongue, the stratified epithelial lining of oral mucosa and esophagus, the outer root sheath of hair follicles, and the glandular epithelia. This KRT6 gene in particular encodes the most abundant isoform. Mutations in these genes have been associated with pachyonychia congenita. In addition, peptides from the C-terminal region of the protein have antimicrobial activity against bacterial pathogens. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. NA
TAGLN 6876 transgelin ENSG00000149591 The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. NA
FGFR3 2261 fibroblast growth factor receptor 3 ENSG00000068078 This gene encodes a member of the fibroblast growth factor receptor (FGFR) family, with its amino acid sequence being highly conserved between members and among divergent species. FGFR family members differ from one another in their ligand affinities and tissue distribution. A full-length representative protein would consist of an extracellular region, composed of three immunoglobulin-like domains, a single hydrophobic membrane-spanning segment and a cytoplasmic tyrosine kinase domain. The extracellular portion of the protein interacts with fibroblast growth factors, setting in motion a cascade of downstream signals, ultimately influencing mitogenesis and differentiation. This particular family member binds acidic and basic fibroblast growth hormone and plays a role in bone development and maintenance. Mutations in this gene lead to craniosynostosis and multiple types of skeletal dysplasia. Three alternatively spliced transcript variants that encode different protein isoforms have been described. NA
CHGA 1113 chromogranin A ENSG00000100604 The protein encoded by this gene is a member of the chromogranin/secretogranin family of neuroendocrine secretory proteins. It is found in secretory vesicles of neurons and endocrine cells. This gene product is a precursor to three biologically active peptides; vasostatin, pancreastatin, and parastatin. These peptides act as autocrine or paracrine negative modulators of the neuroendocrine system. Two other peptides, catestatin and chromofungin, have antimicrobial activity and antifungal activity, respectively. Two transcript variants encoding different isoforms have been found for this gene. NA
STMN1 3925 stathmin 1 ENSG00000117632 This gene belongs to the stathmin family of genes. It encodes a ubiquitous cytosolic phosphoprotein proposed to function as an intracellular relay integrating regulatory signals of the cellular environment. The encoded protein is involved in the regulation of the microtubule filament system by destabilizing microtubules. It prevents assembly and promotes disassembly of microtubules. Multiple transcript variants encoding different isoforms have been found for this gene. NA
ADH1B 125 alcohol dehydrogenase 1B (class I), beta polypeptide ENSG00000196616 The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. NA
MT3 4504 metallothionein 3 ENSG00000087250 NA NA
CERCAM 51148 cerebral endothelial cell adhesion molecule ENSG00000167123 NA NA
NEAT1 283131 nuclear paraspeckle assembly transcript 1 (non-protein coding) ENSG00000245532 This gene produces a long non-coding RNA (lncRNA) transcribed from the multiple endocrine neoplasia locus. This lncRNA is retained in the nucleus where it forms the core structural component of the paraspeckle sub-organelles. It may act as a transcriptional regulator for numerous genes, including some genes involved in cancer progression. NA
MTCO1P12 ENSG00000237973 MT-CO1 pseudogene 12 ENSG00000237973 NA NA
HSPG2 3339 heparan sulfate proteoglycan 2 ENSG00000142798 This gene encodes the perlecan protein, which consists of a core protein to which three long chains of glycosaminoglycans (heparan sulfate or chondroitin sulfate) are attached. The perlecan protein is a large multidomain proteoglycan that binds to and cross-links many extracellular matrix components and cell-surface molecules. It has been shown that this protein interacts with laminin, prolargin, collagen type IV, FGFBP1, FBLN2, FGF7 and transthyretin, etc., and it plays essential roles in multiple biological activities. Perlecan is a key component of the vascular extracellular matrix, where it helps to maintain the endothelial barrier function. It is a potent inhibitor of smooth muscle cell proliferation and is thus thought to help maintain vascular homeostasis. It can also promote growth factor (e.g., FGF2) activity and thus stimulate endothelial growth and re-generation. It is a major component of basement membranes, where it is involved in the stabilization of other molecules as well as being involved with glomerular permeability to macromolecules and cell adhesion. Mutations in this gene cause Schwartz-Jampel syndrome type 1, Silverman-Handmaker type of dyssegmental dysplasia, and tardive dyskinesia. Alternative splicing of this gene results in multiple transcript variants. NA
APOD 347 apolipoprotein D ENSG00000189058 This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. NA
ODF2 4957 outer dense fiber of sperm tails 2 ENSG00000136811 The outer dense fibers are cytoskeletal structures that surround the axoneme in the middle piece and principal piece of the sperm tail. The fibers function in maintaining the elastic structure and recoil of the sperm tail as well as in protecting the tail from shear forces during epididymal transport and ejaculation. Defects in the outer dense fibers lead to abnormal sperm morphology and infertility. This gene encodes one of the major outer dense fiber proteins. Alternative splicing results in multiple transcript variants. The longer transcripts, also known as ‘Cenexins’, encode proteins with a C-terminal extension that are differentially targeted to somatic centrioles and thought to be crucial for the formation of microtubule organizing centers. NA
NA NA NA ENSG00000259716 NA TRUE
KRT5 3852 keratin 5 ENSG00000186081 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the basal layer of the epidermis with family member KRT14. Mutations in these genes have been associated with a complex of diseases termed epidermolysis bullosa simplex. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. NA
PLIN2 123 perilipin 2 ENSG00000147872 The protein encoded by this gene belongs to the perilipin family, members of which coat intracellular lipid storage droplets. This protein is associated with the lipid globule surface membrane material, and maybe involved in development and maintenance of adipose tissue. However, it is not restricted to adipocytes as previously thought, but is found in a wide range of cultured cell lines, including fibroblasts, endothelial and epithelial cells, and tissues, such as lactating mammary gland, adrenal cortex, Sertoli and Leydig cells, and hepatocytes in alcoholic liver cirrhosis, suggesting that it may serve as a marker of lipid accumulation in diverse cell types and diseases. Alternatively spliced transcript variants have been found for this gene. NA
RHCG 51458 Rh family C glycoprotein ENSG00000140519 NA NA
ENHO 375704 energy homeostasis associated ENSG00000168913 NA NA
RPL13A 23521 ribosomal protein L13a ENSG00000142541 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a member of the L13P family of ribosomal proteins that is a component of the 60S subunit. The encoded protein also plays a role in the repression of inflammatory genes as a component of the IFN-gamma-activated inhibitor of translation (GAIT) complex. This gene is co-transcribed with the small nucleolar RNA genes U32, U33, U34, and U35, which are located in the second, fourth, fifth, and sixth introns, respectively. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed throughout the genome. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
FASN 2194 fatty acid synthase ENSG00000169710 The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. NA
ACTA1 58 actin, alpha 1, skeletal muscle ENSG00000143632 The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. NA
RPS6 6194 ribosomal protein S6 ENSG00000137154 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a cytoplasmic ribosomal protein that is a component of the 40S subunit. The protein belongs to the S6E family of ribosomal proteins. It is the major substrate of protein kinases in the ribosome, with subsets of five C-terminal serine residues phosphorylated by different protein kinases. Phosphorylation is induced by a wide range of stimuli, including growth factors, tumor-promoting agents, and mitogens. Dephosphorylation occurs at growth arrest. The protein may contribute to the control of cell growth and proliferation through the selective translation of particular classes of mRNA. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. NA
PPL 5493 periplakin ENSG00000118898 The protein encoded by this gene is a component of desmosomes and of the epidermal cornified envelope in keratinocytes. The N-terminal domain of this protein interacts with the plasma membrane and its C-terminus interacts with intermediate filaments. Through its rod domain, this protein forms complexes with envoplakin. This protein may serve as a link between the cornified envelope and desmosomes as well as intermediate filaments. AKT1/PKB, a protein kinase mediating a variety of cell growth and survival signaling processes, is reported to interact with this protein, suggesting a possible role for this protein as a localization signal in AKT1-mediated signaling. NA
CELA2B 51032 chymotrypsin like elastase family member 2B ENSG00000215704 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Like most of the human elastases, elastase 2B is secreted from the pancreas as a zymogen. In other species, elastase 2B has been shown to preferentially cleave proteins after leucine, methionine, and phenylalanine residues. NA
GNG7 2788 G protein subunit gamma 7 ENSG00000176533 NA NA
FBXL16 146330 F-box and leucine rich repeat protein 16 ENSG00000127585 Members of the F-box protein family, such as FBXL16, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). NA
SPINT2 10653 serine peptidase inhibitor, Kunitz type, 2 ENSG00000167642 This gene encodes a transmembrane protein with two extracellular Kunitz domains that inhibits a variety of serine proteases. The protein inhibits HGF activator which prevents the formation of active hepatocyte growth factor. This gene is a putative tumor suppressor, and mutations in this gene result in congenital sodium diarrhea. Multiple transcript variants encoding different isoforms have been found for this gene. NA
JUP 3728 junction plakoglobin ENSG00000173801 This gene encodes a major cytoplasmic protein which is the only known constituent common to submembranous plaques of both desmosomes and intermediate junctions. This protein forms distinct complexes with cadherins and desmosomal cadherins and is a member of the catenin family since it contains a distinct repeating amino acid motif called the armadillo repeat. Mutation in this gene has been associated with Naxos disease. Alternative splicing occurs in this gene; however, not all transcripts have been fully described. NA
LOC105372824 105372824 uncharacterized LOC105372824 ENSG00000160209 NA NA
PDXK 8566 pyridoxal (pyridoxine, vitamin B6) kinase ENSG00000160209 The protein encoded by this gene phosphorylates vitamin B6, a step required for the conversion of vitamin B6 to pyridoxal-5-phosphate, an important cofactor in intermediary metabolism. The encoded protein is cytoplasmic and probably acts as a homodimer. Alternatively spliced transcript variants have been described, but their biological validity has not been determined. NA
NRGN 4900 neurogranin ENSG00000154146 Neurogranin (NRGN) is the human homolog of the neuron-specific rat RC3/neurogranin gene. This gene encodes a postsynaptic protein kinase substrate that binds calmodulin in the absence of calcium. The NRGN gene contains four exons and three introns. The exons 1 and 2 encode the protein and exons 3 and 4 contain untranslated sequences. It is suggested that the NRGN is a direct target for thyroid hormone in human brain, and that control of expression of this gene could underlay many of the consequences of hypothyroidism on mental states during development as well as in adult subjects. NA
FAM107A 11170 family with sequence similarity 107 member A ENSG00000168309 NA NA
VCAN 1462 versican ENSG00000038427 This gene is a member of the aggrecan/versican proteoglycan family. The protein encoded is a large chondroitin sulfate proteoglycan and is a major component of the extracellular matrix. This protein is involved in cell adhesion, proliferation, proliferation, migration and angiogenesis and plays a central role in tissue morphogenesis and maintenance. Mutations in this gene are the cause of Wagner syndrome type 1. Multiple transcript variants encoding different isoforms have been found for this gene. NA
CKB 1152 creatine kinase B ENSG00000166165 The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in brain as well as in other tissues, and as a heterodimer with a similar muscle isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. A pseudogene of this gene has been characterized. NA
DSTN 11034 destrin, actin depolymerizing factor ENSG00000125868 The product of this gene belongs to the actin-binding proteins ADF family. This family of proteins is responsible for enhancing the turnover rate of actin in vivo. This gene encodes the actin depolymerizing protein that severs actin filaments (F-actin) and binds to actin monomers (G-actin). Two transcript variants encoding distinct isoforms have been identified for this gene. NA
PLXNB1 5364 plexin B1 ENSG00000164050 NA NA
ALDOA 226 aldolase, fructose-bisphosphate A ENSG00000149925 The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10. NA
FSTL1 11167 follistatin like 1 ENSG00000163430 This gene encodes a protein with similarity to follistatin, an activin-binding protein. It contains an FS module, a follistatin-like sequence containing 10 conserved cysteine residues. This gene product is thought to be an autoantigen associated with rheumatoid arthritis. NA
MGST1 4257 microsomal glutathione S-transferase 1 ENSG00000008394 The MAPEG (Membrane Associated Proteins in Eicosanoid and Glutathione metabolism) family consists of six human proteins, two of which are involved in the production of leukotrienes and prostaglandin E, important mediators of inflammation. Other family members, demonstrating glutathione S-transferase and peroxidase activities, are involved in cellular defense against toxic, carcinogenic, and pharmacologically active electrophilic compounds. This gene encodes a protein that catalyzes the conjugation of glutathione to electrophiles and the reduction of lipid hydroperoxides. This protein is localized to the endoplasmic reticulum and outer mitochondrial membrane where it is thought to protect these membranes from oxidative stress. Several transcript variants, some non-protein coding and some protein coding, have been found for this gene. NA
GAPDH 2597 glyceraldehyde-3-phosphate dehydrogenase ENSG00000111640 This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. NA
MYRF 745 myelin regulatory factor ENSG00000124920 This gene encodes a transcription factor that is required for central nervous system myelination and may regulate oligodendrocyte differentiation. It is thought to act by increasing the expression of genes that effect myelin production but may also directly promote myelin gene expression. Loss of a similar gene in mouse models results in severe demyelination. Alternative splicing results in multiple transcript variants. NA
SYNPO2 171024 synaptopodin 2 ENSG00000172403 NA NA
SPOCK2 9806 sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 2 ENSG00000107742 This gene encodes a protein which binds with glycosaminoglycans to form part of the extracellular matrix. The protein contains thyroglobulin type-1, follistatin-like, and calcium-binding domains, and has glycosaminoglycan attachment sites in the acidic C-terminal region. Three alternatively spliced transcript variants that encode different protein isoforms have been described for this gene. NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",15,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 16 Annotations

out <- mygene::queryMany(gene_list[16,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query symbol summary X_id name notfound
ENSG00000042832 TG Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. 7038 thyroglobulin NA
ENSG00000115705 TPO This gene encodes a membrane-bound glycoprotein. The encoded protein acts as an enzyme and plays a central role in thyroid gland function. The protein functions in the iodination of tyrosine residues in thyroglobulin and phenoxy-ester formation between pairs of iodinated tyrosines to generate the thyroid hormones, thyroxine and triiodothyronine. Mutations in this gene are associated with several disorders of thyroid hormonogenesis, including congenital hypothyroidism, congenital goiter, and thyroid hormone organification defect IIA. Multiple transcript variants encoding distinct isoforms have been identified for this gene, but the full-length nature of some variants has not been determined. 7173 thyroid peroxidase NA
ENSG00000163631 ALB Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. 213 albumin NA
ENSG00000125618 PAX8 This gene encodes a member of the paired box (PAX) family of transcription factors. Members of this gene family typically encode proteins that contain a paired box domain, an octapeptide, and a paired-type homeodomain. This nuclear protein is involved in thyroid follicular cell development and expression of thyroid-specific genes. Mutations in this gene have been associated with thyroid dysgenesis, thyroid follicular carcinomas and atypical follicular thyroid adenomas. Alternatively spliced transcript variants encoding different isoforms have been described. 7849 paired box 8 NA
ENSG00000257017 HP This gene encodes a preproprotein, which is processed to yield both alpha and beta chains, which subsequently combine as a tetramer to produce haptoglobin. Haptoglobin functions to bind free plasma hemoglobin, which allows degradative enzymes to gain access to the hemoglobin, while at the same time preventing loss of iron through the kidneys and protecting the kidneys from damage by hemoglobin. Mutations in this gene and/or its regulatory regions cause ahaptoglobinemia or hypohaptoglobinemia. This gene has also been linked to diabetic nephropathy, the incidence of coronary artery disease in type 1 diabetes, Crohn’s disease, inflammatory disease behavior, primary sclerosing cholangitis, susceptibility to idiopathic Parkinson’s disease, and a reduced incidence of Plasmodium falciparum malaria. The protein encoded also exhibits antimicrobial activity against bacteria. A similar duplicated gene is located next to this gene on chromosome 16. Multiple transcript variants encoding different isoforms have been found for this gene. 3240 haptoglobin NA
ENSG00000171560 FGA This gene encodes the alpha subunit of the coagulation factor fibrinogen, which is a component of the blood clot. Following vascular injury, the encoded preproprotein is proteolytically processed by thrombin during the conversion of fibrinogen to fibrin. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia, afibrinogenemia and renal amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. 2243 fibrinogen alpha chain NA
ENSG00000171564 FGB The protein encoded by this gene is the beta component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including afibrinogenemia, dysfibrinogenemia, hypodysfibrinogenemia and thrombotic tendency. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 2244 fibrinogen beta chain NA
ENSG00000125730 C3 Complement component C3 plays a central role in the activation of complement system. Its activation is required for both classical and alternative complement activation pathways. The encoded preproprotein is proteolytically processed to generate alpha and beta subunits that form the mature protein, which is then further processed to generate numerous peptide products. The C3a peptide, also known as the C3a anaphylatoxin, modulates inflammation and possesses antimicrobial activity. Mutations in this gene are associated with atypical hemolytic uremic syndrome and age-related macular degeneration in human patients. 718 complement component 3 NA
ENSG00000229314 ORM1 This gene encodes a key acute phase plasma protein. Because of its increase due to acute inflammation, this protein is classified as an acute-phase reactant. The specific function of this protein has not yet been determined; however, it may be involved in aspects of immunosuppression. 5004 orosomucoid 1 NA
ENSG00000107317 PTGDS The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. 5730 prostaglandin D2 synthase NA
ENSG00000171557 FGG The protein encoded by this gene is the gamma component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia and thrombophilia. Alternative splicing results in transcript variants encoding different isoforms. 2266 fibrinogen gamma chain NA
ENSG00000197971 MBP The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. 4155 myelin basic protein NA
ENSG00000132693 CRP The protein encoded by this gene belongs to the pentaxin family. It is involved in several host defense related functions based on its ability to recognize foreign pathogens and damaged cells of the host and to initiate their elimination by interacting with humoral and cellular effector systems in the blood. Consequently, the level of this protein in plasma increases greatly during acute phase response to tissue injury, infection, or other inflammatory stimuli. 1401 C-reactive protein, pentraxin-related NA
ENSG00000090920 NA NA NA NA TRUE
ENSG00000111275 ALDH2 This protein belongs to the aldehyde dehydrogenase family of proteins. Aldehyde dehydrogenase is the second enzyme of the major oxidative pathway of alcohol metabolism. Two major liver isoforms of aldehyde dehydrogenase, cytosolic and mitochondrial, can be distinguished by their electrophoretic mobilities, kinetic properties, and subcellular localizations. Most Caucasians have two major isozymes, while approximately 50% of Orientals have the cytosolic isozyme but not the mitochondrial isozyme. A remarkably higher frequency of acute alcohol intoxication among Orientals than among Caucasians could be related to the absence of a catalytically active form of the mitochondrial isozyme. The increased exposure to acetaldehyde in individuals with the catalytically inactive form may also confer greater susceptibility to many types of cancer. This gene encodes a mitochondrial isoform, which has a low Km for acetaldehydes, and is localized in mitochondrial matrix. Alternative splicing results in multiple transcript variants encoding distinct isoforms. 217 aldehyde dehydrogenase 2 family (mitochondrial) NA
ENSG00000164733 CTSB This gene encodes a member of the C1 family of peptidases. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate multiple protein products. These products include the cathepsin B light and heavy chains, which can dimerize to form the double chain form of the enzyme. This enzyme is a lysosomal cysteine protease with both endopeptidase and exopeptidase activity that may play a role in protein turnover. It is also known as amyloid precursor protein secretase and is involved in the proteolytic processing of amyloid precursor protein (APP). Incomplete proteolytic processing of APP has been suggested to be a causative factor in Alzheimer’s disease, the most common cause of dementia. Overexpression of the encoded protein has been associated with esophageal adenocarcinoma and other tumors. Multiple pseudogenes of this gene have been identified. 1508 cathepsin B NA
ENSG00000160180 TFF3 Members of the trefoil family are characterized by having at least one copy of the trefoil motif, a 40-amino acid domain that contains three conserved disulfides. They are stable secretory proteins expressed in gastrointestinal mucosa. Their functions are not defined, but they may protect the mucosa from insults, stabilize the mucus layer and affect healing of the epithelium. This gene is expressed in goblet cells of the intestines and colon. This gene and two other related trefoil family member genes are found in a cluster on chromosome 21. 7033 trefoil factor 3 NA
ENSG00000135046 ANXA1 This gene encodes a membrane-localized protein that binds phospholipids. This protein inhibits phospholipase A2 and has anti-inflammatory activity. Loss of function or expression of this gene has been detected in multiple tumors. 301 annexin A1 NA
ENSG00000134531 EMP1 NA 2012 epithelial membrane protein 1 NA
ENSG00000111640 GAPDH This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. 2597 glyceraldehyde-3-phosphate dehydrogenase NA
ENSG00000110245 APOC3 Apolipoprotein C-III is a very low density lipoprotein (VLDL) protein. APOC3 inhibits lipoprotein lipase and hepatic lipase; it is thought to delay catabolism of triglyceride-rich particles. The APOA1, APOC3 and APOA4 genes are closely linked in both rat and human genomes. The A-I and A-IV genes are transcribed from the same strand, while the A-1 and C-III genes are convergently transcribed. An increase in apoC-III levels induces the development of hypertriglyceridemia. 345 apolipoprotein C3 NA
ENSG00000130600 H19 This gene is located in an imprinted region of chromosome 11 near the insulin-like growth factor 2 (IGF2) gene. This gene is only expressed from the maternally-inherited chromosome, whereas IGF2 is only expressed from the paternally-inherited chromosome. The product of this gene is a long non-coding RNA which functions as a tumor suppressor. Mutations in this gene have been associated with Beckwith-Wiedemann Syndrome and Wilms tumorigenesis. Alternative splicing results in multiple transcript variants. 283120 H19, imprinted maternally expressed transcript (non-protein coding) NA
ENSG00000151726 ACSL1 The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. Several transcript variants encoding different isoforms have been found for this gene. 2180 acyl-CoA synthetase long-chain family member 1 NA
ENSG00000117984 CTSD This gene encodes a member of the A1 family of peptidases. The encoded preproprotein is proteolytically processed to generate multiple protein products. These products include the cathepsin D light and heavy chains, which heterodimerize to form the mature enzyme. This enzyme exhibits pepsin-like activity and plays a role in protein turnover and in the proteolytic activation of hormones and growth factors. Mutations in this gene play a causal role in neuronal ceroid lipofuscinosis-10 and may be involved in the pathogenesis of several other diseases, including breast cancer and possibly Alzheimer’s disease. 1509 cathepsin D NA
ENSG00000122304 PRM2 Protamines substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis, and are the major DNA-binding proteins in the nucleus of sperm in many vertebrates. They package the sperm DNA into a highly condensed complex in a volume less than 5% of a somatic cell nucleus. Many mammalian species have only one protamine (protamine 1); however, a few species, including human and mouse, have two. This gene encodes protamine 2, which is cleaved to give rise to a family of protamine 2 peptides. Alternatively spliced transcript variants have also been found for this gene. 5620 protamine 2 NA
ENSG00000091583 APOH Apolipoprotein H has been implicated in a variety of physiologic pathways including lipoprotein metabolism, coagulation, and the production of antiphospholipid autoantibodies. APOH may be a required cofactor for anionic phospholipid binding by the antiphospholipid autoantibodies found in sera of many patients with lupus and primary antiphospholipid syndrome, but it does not seem to be required for the reactivity of antiphospholipid autoantibodies associated with infections. 350 apolipoprotein H NA
ENSG00000185133 INPP5J NA 27124 inositol polyphosphate-5-phosphatase J NA
ENSG00000018625 ATP1A2 The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The catalytic subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes an alpha 2 subunit. Mutations in this gene result in familial basilar or hemiplegic migraines, and in a rare syndrome known as alternating hemiplegia of childhood. 477 ATPase Na+/K+ transporting subunit alpha 2 NA
ENSG00000170315 UBB This gene encodes ubiquitin, one of the most conserved proteins known. Ubiquitin has a major role in targeting cellular proteins for degradation by the 26S proteosome. It is also involved in the maintenance of chromatin structure, the regulation of gene expression, and the stress response. Ubiquitin is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin moiety fused to an unrelated protein. This gene consists of three direct repeats of the ubiquitin coding sequence with no spacer sequence. Consequently, the protein is expressed as a polyubiquitin precursor with a final amino acid after the last repeat. An aberrant form of this protein has been detected in patients with Alzheimer’s disease and Down syndrome. Pseudogenes of this gene are located on chromosomes 1, 2, 13, and 17. Alternative splicing results in multiple transcript variants. 7314 ubiquitin B NA
ENSG00000168743 NPNT NA 255743 nephronectin NA
ENSG00000026025 VIM This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. 7431 vimentin NA
ENSG00000101670 LIPG The protein encoded by this gene has substantial phospholipase activity and may be involved in lipoprotein metabolism and vascular biology. This protein is designated a member of the TG lipase family by its sequence and characteristic lid region which provides substrate specificity for enzymes of the TG lipase family. 9388 lipase G, endothelial type NA
ENSG00000175084 DES This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. 1674 desmin NA
ENSG00000155657 TTN This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. 7273 titin NA
ENSG00000135929 CYP27A1 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This mitochondrial protein oxidizes cholesterol intermediates as part of the bile synthesis pathway. Since the conversion of cholesterol to bile acids is the major route for removing cholesterol from the body, this protein is important for overall cholesterol homeostasis. Mutations in this gene cause cerebrotendinous xanthomatosis, a rare autosomal recessive lipid storage disease. 1593 cytochrome P450 family 27 subfamily A member 1 NA
ENSG00000106927 AMBP This gene encodes a complex glycoprotein secreted in plasma. The precursor is proteolytically processed into distinct functioning proteins: alpha-1-microglobulin, which belongs to the superfamily of lipocalin transport proteins and may play a role in the regulation of inflammatory processes, and bikunin, which is a urinary trypsin inhibitor belonging to the superfamily of Kunitz-type protease inhibitors and plays an important role in many physiological and pathological processes. This gene is located on chromosome 9 in a cluster of lipocalin genes. 259 alpha-1-microglobulin/bikunin precursor NA
ENSG00000174437 ATP2A2 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol into the sarcoplasmic reticulum lumen, and is involved in regulation of the contraction/relaxation cycle. Mutations in this gene cause Darier-White disease, also known as keratosis follicularis, an autosomal dominant skin disorder characterized by loss of adhesion between epidermal cells and abnormal keratinization. Alternative splicing results in multiple transcript variants encoding different isoforms. 488 ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 2 NA
ENSG00000073849 ST6GAL1 This gene encodes a member of glycosyltransferase family 29. The encoded protein is a type II membrane protein that catalyzes the transfer of sialic acid from CMP-sialic acid to galactose-containing substrates. The protein, which is normally found in the Golgi but can be proteolytically processed to a soluble form, is involved in the generation of the cell-surface carbohydrate determinants and differentiation antigens HB-6, CD75, and CD76. This gene has been incorrectly referred to as CD75. Three transcript variants encoding two different isoforms have been described. 6480 ST6 beta-galactoside alpha-2,6-sialyltransferase 1 NA
ENSG00000115414 FN1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. 2335 fibronectin 1 NA
ENSG00000147872 PLIN2 The protein encoded by this gene belongs to the perilipin family, members of which coat intracellular lipid storage droplets. This protein is associated with the lipid globule surface membrane material, and maybe involved in development and maintenance of adipose tissue. However, it is not restricted to adipocytes as previously thought, but is found in a wide range of cultured cell lines, including fibroblasts, endothelial and epithelial cells, and tissues, such as lactating mammary gland, adrenal cortex, Sertoli and Leydig cells, and hepatocytes in alcoholic liver cirrhosis, suggesting that it may serve as a marker of lipid accumulation in diverse cell types and diseases. Alternatively spliced transcript variants have been found for this gene. 123 perilipin 2 NA
ENSG00000158874 APOA2 This gene encodes apolipoprotein (apo-) A-II, which is the second most abundant protein of the high density lipoprotein particles. The protein is found in plasma as a monomer, homodimer, or heterodimer with apolipoprotein D. Defects in this gene may result in apolipoprotein A-II deficiency or hypercholesterolemia. 336 apolipoprotein A2 NA
ENSG00000130707 ASS1 The protein encoded by this gene catalyzes the penultimate step of the arginine biosynthetic pathway. There are approximately 10 to 14 copies of this gene including the pseudogenes scattered across the human genome, among which the one located on chromosome 9 appears to be the only functional gene for argininosuccinate synthetase. Mutations in the chromosome 9 copy of this gene cause citrullinemia. Two transcript variants encoding the same protein have been found for this gene. 445 argininosuccinate synthase 1 NA
ENSG00000095321 CRAT This gene encodes carnitine acetyltransferase (CRAT), which is a key enzyme in the metabolic pathway in mitochondria, peroxisomes and endoplasmic reticulum. CRAT catalyzes the reversible transfer of acyl groups from an acyl-CoA thioester to carnitine and regulates the ratio of acylCoA/CoA in the subcellular compartments. Two transcript variants encoding different isoforms have been found for this gene. 1384 carnitine O-acetyltransferase NA
ENSG00000135480 KRT7 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the simple epithelia lining the cavities of the internal organs and in the gland ducts and blood vessels. The genes encoding the type II cytokeratins are clustered in a region of chromosome 12q12-q13. Alternative splicing may result in several transcript variants; however, not all variants have been fully described. 3855 keratin 7 NA
ENSG00000159069 FBXW5 This gene encodes a member of the F-box protein family, members of which are characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into three classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene contains WD-40 domains, in addition to an F-box motif, so it belongs to the Fbw class. Alternatively spliced transcript variants encoding distinct isoforms have been identified for this gene, however, they were found to be nonsense-mediated mRNA decay (NMD) candidates, hence not represented. 54461 F-box and WD repeat domain containing 5 NA
ENSG00000175646 PRM1 NA 5619 protamine 1 NA
ENSG00000122367 LDB3 This gene encodes a PDZ domain-containing protein. PDZ motifs are modular protein-protein interaction domains consisting of 80-120 amino acid residues. PDZ domain-containing proteins interact with each other in cytoskeletal assembly or with other proteins involved in targeting and clustering of membrane proteins. The protein encoded by this gene interacts with alpha-actinin-2 through its N-terminal PDZ domain and with protein kinase C via its C-terminal LIM domains. The LIM domain is a cysteine-rich motif defined by 50-60 amino acids containing two zinc-binding modules. This protein also interacts with all three members of the myozenin family. Mutations in this gene have been associated with myofibrillar myopathy and dilated cardiomyopathy. Alternatively spliced transcript variants encoding different isoforms have been identified; all isoforms have N-terminal PDZ domains while only longer isoforms (1, 2 and 5) have C-terminal LIM domains. 11155 LIM domain binding 3 NA
ENSG00000060138 YBX3 NA 8531 Y-box binding protein 3 NA
ENSG00000130203 APOE The protein encoded by this gene is a major apoprotein of the chylomicron. It binds to a specific liver and peripheral cell receptor, and is essential for the normal catabolism of triglyceride-rich lipoprotein constituents. This gene maps to chromosome 19 in a cluster with the related apolipoprotein C1 and C2 genes. Mutations in this gene result in familial dysbetalipoproteinemia, or type III hyperlipoproteinemia (HLP III), in which increased plasma cholesterol and triglycerides are the consequence of impaired clearance of chylomicron and VLDL remnants. Alternative splicing results in multiple transcript variants. 348 apolipoprotein E NA
ENSG00000169129 AFAP1L2 NA 84632 actin filament associated protein 1 like 2 NA
ENSG00000133048 CHI3L1 Chitinases catalyze the hydrolysis of chitin, which is an abundant glycopolymer found in insect exoskeletons and fungal cell walls. The glycoside hydrolase 18 family of chitinases includes eight human family members. This gene encodes a glycoprotein member of the glycosyl hydrolase 18 family. The protein lacks chitinase activity and is secreted by activated macrophages, chondrocytes, neutrophils and synovial cells. The protein is thought to play a role in the process of inflammation and tissue remodeling. 1116 chitinase 3 like 1 NA
ENSG00000101210 EEF1A2 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 2) is expressed in brain, heart and skeletal muscle, and the other isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas. This gene may be critical in the development of ovarian cancer. 1917 eukaryotic translation elongation factor 1 alpha 2 NA
ENSG00000237973 MTCO1P12 NA ENSG00000237973 MT-CO1 pseudogene 12 NA
ENSG00000128591 FLNC This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. 2318 filamin C NA
ENSG00000158828 PINK1 This gene encodes a serine/threonine protein kinase that localizes to mitochondria. It is thought to protect cells from stress-induced mitochondrial dysfunction. Mutations in this gene cause one form of autosomal recessive early-onset Parkinson disease. 65018 PTEN induced putative kinase 1 NA
ENSG00000166598 HSP90B1 This gene encodes a member of a family of adenosine triphosphate(ATP)-metabolizing molecular chaperones with roles in stabilizing and folding other proteins. The encoded protein is localized to melanosomes and the endoplasmic reticulum. Expression of this protein is associated with a variety of pathogenic states, including tumor formation. There is a microRNA gene located within the 5’ exon of this gene. There are pseudogenes for this gene on chromosomes 1 and 15. 7184 heat shock protein 90kDa beta family member 1 NA
ENSG00000080824 HSP90AA1 The protein encoded by this gene is an inducible molecular chaperone that functions as a homodimer. The encoded protein aids in the proper folding of specific target proteins by use of an ATPase activity that is modulated by co-chaperones. Two transcript variants encoding different isoforms have been found for this gene. 3320 heat shock protein 90kDa alpha family class A member 1 NA
ENSG00000115112 TFCP2L1 NA 29842 transcription factor CP2-like 1 NA
ENSG00000189058 APOD This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. 347 apolipoprotein D NA
ENSG00000175265 GOLGA8A The Golgi apparatus, which participates in glycosylation and transport of proteins and lipids in the secretory pathway, consists of a series of stacked, flattened membrane sacs referred to as cisternae. Interactions between the Golgi and microtubules are thought to be important for the reorganization of the Golgi after it fragments during mitosis. The golgins constitute a family of proteins which are localized to the Golgi. This gene encodes a golgin which structurally resembles its family member GOLGA2, suggesting that they may share a similar function. There are many similar copies of this gene on chromosome 15. Alternative splicing results in multiple transcript variants. 23015 golgin A8 family member A NA
ENSG00000143878 RHOB NA 388 ras homolog family member B NA
ENSG00000205517 RGL3 NA 57139 ral guanine nucleotide dissociation stimulator like 3 NA
ENSG00000104879 CKM The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. 1158 creatine kinase, M-type NA
ENSG00000166347 CYB5A The protein encoded by this gene is a membrane-bound cytochrome that reduces ferric hemoglobin (methemoglobin) to ferrous hemoglobin, which is required for stearyl-CoA-desaturase activity. Defects in this gene are a cause of type IV hereditary methemoglobinemia. Three transcript variants encoding different isoforms have been found for this gene. 1528 cytochrome b5 type A NA
ENSG00000173641 HSPB7 NA 27129 heat shock protein family B (small) member 7 NA
ENSG00000151729 SLC25A4 This gene is a member of the mitochondrial carrier subfamily of solute carrier protein genes. The product of this gene functions as a gated pore that translocates ADP from the cytoplasm into the mitochondrial matrix and ATP from the mitochondrial matrix into the cytoplasm. The protein forms a homodimer embedded in the inner mitochondria membrane. Mutations in this gene have been shown to result in autosomal dominant progressive external opthalmoplegia and familial hypertrophic cardiomyopathy. 291 solute carrier family 25 member 4 NA
ENSG00000169738 DCXR The protein encoded by this gene acts as a homotetramer to catalyze diacetyl reductase and L-xylulose reductase reactions. The encoded protein may play a role in the uronate cycle of glucose metabolism and in the cellular osmoregulation in the proximal renal tubules. Defects in this gene are a cause of pentosuria. Two transcript variants encoding different isoforms have been found for this gene. 51181 dicarbonyl/L-xylulose reductase NA
ENSG00000088836 SLC4A11 This gene encodes a voltage-regulated, electrogenic sodium-coupled borate cotransporter that is essential for borate homeostasis, cell growth and cell proliferation. Mutations in this gene have been associated with a number of endothelial corneal dystrophies including recessive corneal endothelial dystrophy 2, corneal dystrophy and perceptive deafness, and Fuchs endothelial corneal dystrophy. Multiple transcript variants encoding different isoforms have been described. 83959 solute carrier family 4 member 11 NA
ENSG00000266844 RP11-862L9.3 NA ENSG00000266844 NA NA
ENSG00000054690 PLEKHH1 NA 57475 pleckstrin homology, MyTH4 and FERM domain containing H1 NA
ENSG00000265401 RP11-138I1.4 NA ENSG00000265401 NA NA
ENSG00000137198 GMPR This gene encodes an enzyme that catalyzes the irreversible and NADPH-dependent reductive deamination of GMP to IMP. The protein also functions in the re-utilization of free intracellular bases and purine nucleosides. 2766 guanosine monophosphate reductase NA
ENSG00000175206 NPPA The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. 4878 natriuretic peptide A NA
ENSG00000010318 PHF7 Spermatogenesis is a complex process regulated by extracellular and intracellular factors as well as cellular interactions among interstitial cells of the testis, Sertoli cells, and germ cells. This gene is expressed in the testis in Sertoli cells but not germ cells. The protein encoded by this gene contains plant homeodomain (PHD) finger domains, also known as leukemia associated protein (LAP) domains, believed to be involved in transcriptional regulation. The protein, which localizes to the nucleus of transfected cells, has been implicated in the transcriptional regulation of spermatogenesis. Alternate splicing results in multiple transcript variants of this gene. 51533 PHD finger protein 7 NA
ENSG00000269968 RP5-940J5.9 NA ENSG00000269968 NA NA
ENSG00000182718 ANXA2 This gene encodes a member of the annexin family. Members of this calcium-dependent phospholipid-binding protein family play a role in the regulation of cellular growth and in signal transduction pathways. This protein functions as an autocrine factor which heightens osteoclast formation and bone resorption. This gene has three pseudogenes located on chromosomes 4, 9 and 10, respectively. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. 302 annexin A2 NA
ENSG00000198467 TPM2 This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 7169 tropomyosin 2 (beta) NA
ENSG00000101605 MYOM1 The giant protein titin, together with its associated proteins, interconnects the major structure of sarcomeres, the M bands and Z discs. The C-terminal end of the titin string extends into the M line, where it binds tightly to M-band constituents of apparent molecular masses of 190 kD (myomesin 1) and 165 kD (myomesin 2). This protein, myomesin 1, like myomesin 2, titin, and other myofibrillar proteins contains structural modules with strong homology to either fibronectin type III (motif I) or immunoglobulin C2 (motif II) domains. Myomesin 1 and myomesin 2 each have a unique N-terminal region followed by 12 modules of motif I or motif II, in the arrangement II-II-I-I-I-I-I-II-II-II-II-II. The two proteins share 50% sequence identity in this repeat-containing region. The head structure formed by these 2 proteins on one end of the titin string extends into the center of the M band. The integrating structure of the sarcomere arises from muscle-specific members of the superfamily of immunoglobulin-like proteins. Alternatively spliced transcript variants encoding different isoforms have been identified. 8736 myomesin 1 NA
ENSG00000185100 ADSSL1 This gene encodes a member of the adenylosuccinate synthase family of proteins. The encoded muscle-specific enzyme plays a role in the purine nucleotide cycle by catalyzing the first step in the conversion of inosine monophosphate (IMP) to adenosine monophosphate (AMP). Mutations in this gene may cause adolescent onset distal myopathy. Alternative splicing results in multiple transcript variants. 122622 adenylosuccinate synthase like 1 NA
ENSG00000115255 REEP6 NA 92840 receptor accessory protein 6 NA
ENSG00000143549 TPM3 This gene encodes a member of the tropomyosin family of actin-binding proteins. Tropomyosins are dimers of coiled-coil proteins that provide stability to actin filaments and regulate access of other actin-binding proteins. Mutations in this gene result in autosomal dominant nemaline myopathy and other muscle disorders. This locus is involved in translocations with other loci, including anaplastic lymphoma receptor tyrosine kinase (ALK) and neurotrophic tyrosine kinase receptor type 1 (NTRK1), which result in the formation of fusion proteins that act as oncogenes. There are numerous pseudogenes for this gene on different chromosomes. Alternative splicing results in multiple transcript variants. 7170 tropomyosin 3 NA
ENSG00000127884 ECHS1 The protein encoded by this gene functions in the second step of the mitochondrial fatty acid beta-oxidation pathway. It catalyzes the hydration of 2-trans-enoyl-coenzyme A (CoA) intermediates to L-3-hydroxyacyl-CoAs. The gene product is a member of the hydratase/isomerase superfamily. It localizes to the mitochondrial matrix. Transcript variants utilizing alternative transcription initiation sites have been described in the literature. 1892 enoyl-CoA hydratase, short chain, 1, mitochondrial NA
ENSG00000171992 SYNPO Synaptopodin is an actin-associated protein that may play a role in actin-based cell shape and motility. The name synaptopodin derives from the protein’s associations with postsynaptic densities and dendritic spines and with renal podocytes (Mundel et al., 1997 [PubMed 9314539]). 11346 synaptopodin NA
ENSG00000129538 RNASE1 This gene encodes a member of the pancreatic-type of secretory ribonucleases, a subset of the ribonuclease A superfamily. The encoded endonuclease cleaves internal phosphodiester RNA bonds on the 3’-side of pyrimidine bases. It prefers poly(C) as a substrate and hydrolyzes 2’,3’-cyclic nucleotides, with a pH optimum near 8.0. The encoded protein is monomeric and more commonly acts to degrade ds-RNA over ss-RNA. Alternative splicing occurs at this locus and four transcript variants encoding the same protein have been identified. 6035 ribonuclease A family member 1, pancreatic NA
ENSG00000234745 HLA-B HLA-B belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from the endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon 1 encodes the leader peptide, exon 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Hundreds of HLA-B alleles have been described. 3106 major histocompatibility complex, class I, B NA
ENSG00000123240 OPTN This gene encodes the coiled-coil containing protein optineurin. Optineurin may play a role in normal-tension glaucoma and adult-onset primary open angle glaucoma. Optineurin interacts with adenovirus E3-14.7K protein and may utilize tumor necrosis factor-alpha or Fas-ligand pathways to mediate apoptosis, inflammation or vasoconstriction. Optineurin may also function in cellular morphogenesis and membrane trafficking, vesicle trafficking, and transcription activation through its interactions with the RAB8, huntingtin, and transcription factor IIIA proteins. Alternative splicing results in multiple transcript variants encoding the same protein. 10133 optineurin NA
ENSG00000106538 RARRES2 This gene encodes a secreted chemotactic protein that initiates chemotaxis via the ChemR23 G protein-coupled seven-transmembrane domain ligand. Expression of this gene is upregulated by the synthetic retinoid tazarotene and occurs in a wide variety of tissues. The active protein has several roles, including that as an adipokine and as an antimicrobial protein with activity against bacteria and fungi. 5919 retinoic acid receptor responder 2 NA
ENSG00000143164 DCAF6 NA 55827 DDB1 and CUL4 associated factor 6 NA
ENSG00000182054 IDH2 Isocitrate dehydrogenases catalyze the oxidative decarboxylation of isocitrate to 2-oxoglutarate. These enzymes belong to two distinct subclasses, one of which utilizes NAD(+) as the electron acceptor and the other NADP(+). Five isocitrate dehydrogenases have been reported: three NAD(+)-dependent isocitrate dehydrogenases, which localize to the mitochondrial matrix, and two NADP(+)-dependent isocitrate dehydrogenases, one of which is mitochondrial and the other predominantly cytosolic. Each NADP(+)-dependent isozyme is a homodimer. The protein encoded by this gene is the NADP(+)-dependent isocitrate dehydrogenase found in the mitochondria. It plays a role in intermediary metabolism and energy production. This protein may tightly associate or interact with the pyruvate dehydrogenase complex. Alternative splicing results in multiple transcript variants. 3418 isocitrate dehydrogenase (NADP(+)) 2, mitochondrial NA
ENSG00000239775 AC017116.11 NA ENSG00000239775 NA NA
ENSG00000106258 CYP3A5 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. The encoded protein metabolizes drugs as well as the steroid hormones testosterone and progesterone. This gene is part of a cluster of cytochrome P450 genes on chromosome 7q21.1. Two pseudogenes of this gene have been identified within this cluster on chromosome 7. Expression of this gene is widely variable among populations, and a single nucleotide polymorphism that affects transcript splicing has been associated with susceptibility to hypertensions. Alternative splicing results in multiple transcript variants. 1577 cytochrome P450 family 3 subfamily A member 5 NA
ENSG00000148672 GLUD1 This gene encodes glutamate dehydrogenase, which is a mitochondrial matrix enzyme that catalyzes the oxidative deamination of glutamate to alpha-ketoglutarate and ammonia. This enzyme has an important role in regulating amino acid-induced insulin secretion. It is allosterically activated by ADP and inhibited by GTP and ATP. Activating mutations in this gene are a common cause of congenital hyperinsulinism. Alternative splicing of this gene results in multiple transcript variants. The related glutamate dehydrogenase 2 gene on the human X-chromosome originated from this gene via retrotransposition and encodes a soluble form of glutamate dehydrogenase. Related pseudogenes have been identified on chromosomes 10, 18 and X. 2746 glutamate dehydrogenase 1 NA
ENSG00000196091 MYBPC1 This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 4604 myosin binding protein C, slow type NA
ENSG00000073060 SCARB1 The protein encoded by this gene is a plasma membrane receptor for high density lipoprotein cholesterol (HDL). The encoded protein mediates cholesterol transfer to and from HDL. In addition, this protein is a receptor for hepatitis C virus glycoprotein E2. Two transcript variants encoding different isoforms have been found for this gene. 949 scavenger receptor class B member 1 NA
ENSG00000116171 SCP2 This gene encodes two proteins: sterol carrier protein X (SCPx) and sterol carrier protein 2 (SCP2), as a result of transcription initiation from 2 independently regulated promoters. The transcript initiated from the proximal promoter encodes the longer SCPx protein, and the transcript initiated from the distal promoter encodes the shorter SCP2 protein, with the 2 proteins sharing a common C-terminus. Evidence suggests that the SCPx protein is a peroxisome-associated thiolase that is involved in the oxidation of branched chain fatty acids, while the SCP2 protein is thought to be an intracellular lipid transfer protein. This gene is highly expressed in organs involved in lipid metabolism, and may play a role in Zellweger syndrome, in which cells are deficient in peroxisomes and have impaired bile acid synthesis. Alternative splicing of this gene produces multiple transcript variants, some encoding different isoforms. 6342 sterol carrier protein 2 NA
ENSG00000175899 A2M Alpha-2-macroglobulin is a protease inhibitor and cytokine transporter. It inhibits many proteases, including trypsin, thrombin and collagenase. A2M is implicated in Alzheimer disease (AD) due to its ability to mediate the clearance and degradation of A-beta, the major component of beta-amyloid deposits. 2 alpha-2-macroglobulin NA
ENSG00000086015 MAST2 NA 23139 microtubule associated serine/threonine kinase 2 NA
ENSG00000149925 ALDOA The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10. 226 aldolase, fructose-bisphosphate A NA
ENSG00000119938 PPP1R3C This gene encodes a regulatory subunit of protein phosphatase-1 (PP1). PP1 catalyzes reversible protein phosphorylation, which is important in a wide range of cellular activities: neuronal, muscular, RNA splicing, protein synthesis, cell death, and glycogen metabolism, to name just a few. By interacting with different regulatory subunits, PP1 is directed to different parts of the cell, to different substrates, or to respond to extracellular signals. 5507 protein phosphatase 1 regulatory subunit 3C NA
ENSG00000167701 GPT This gene encodes cytosolic alanine aminotransaminase 1 (ALT1); also known as glutamate-pyruvate transaminase 1. This enzyme catalyzes the reversible transamination between alanine and 2-oxoglutarate to generate pyruvate and glutamate and, therefore, plays a key role in the intermediary metabolism of glucose and amino acids. Serum activity levels of this enzyme are routinely used as a biomarker of liver injury caused by drug toxicity, infection, alcohol, and steatosis. A related gene on chromosome 16 encodes a putative mitochondrial alanine aminotransaminase. 2875 glutamic-pyruvate transaminase (alanine aminotransferase) NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",16,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 17 Annotations

out <- mygene::queryMany(gene_list[17,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary X_id symbol name query notfound
The protein encoded by this gene localizes to focal adhesions, regions of the plasma membrane where the cell attaches to the extracellular matrix. This protein crosslinks actin filaments and contains a Src homology 2 (SH2) domain, which is often found in molecules involved in signal transduction. This protein is a substrate of calpain II. Alternative splicing results in multiple transcript variants encoding different isoforms. 7145 TNS1 tensin 1 ENSG00000079308 NA
NA 8404 SPARCL1 SPARC like 1 ENSG00000152583 NA
This gene encodes a preproprotein that is proteolytically processed to form multiple protein products. The major encoded protein product, lactadherin, is a membrane glycoprotein that promotes phagocytosis of apoptotic cells. This protein has also been implicated in wound healing, autoimmune disease, and cancer. Lactadherin can be further processed to form a smaller cleavage product, medin, which comprises the major protein component of aortic medial amyloid (AMA). Alternative splicing results in multiple transcript variants. 4240 MFGE8 milk fat globule-EGF factor 8 protein ENSG00000140545 NA
This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. 7431 VIM vimentin ENSG00000026025 NA
This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. 8490 RGS5 regulator of G-protein signaling 5 ENSG00000143248 NA
The protein encoded by this gene is a leucine-rich repeat protein present in connective tissue extracellular matrix. This protein functions as a molecule anchoring basement membranes to the underlying connective tissue. This protein has been shown to bind type I collagen to basement membranes and type II collagen to cartilage. It also binds the basement membrane heparan sulfate proteoglycan perlecan. This protein is suggested to be involved in the pathogenesis of Hutchinson-Gilford progeria (HGP), which is reported to lack the binding of collagen in basement membranes and cartilage. Alternatively spliced transcript variants encoding the same protein have been observed. 5549 PRELP proline and arginine rich end leucine rich repeat protein ENSG00000188783 NA
This gene belongs to the TIMP gene family. The proteins encoded by this gene family are inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix (ECM). Expression of this gene is induced in response to mitogenic stimulation and this netrin domain-containing protein is localized to the ECM. Mutations in this gene have been associated with the autosomal dominant disorder Sorsby’s fundus dystrophy. 7078 TIMP3 TIMP metallopeptidase inhibitor 3 ENSG00000100234 NA
The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. 4256 MGP matrix Gla protein ENSG00000111341 NA
This gene encodes a member of the insulin-like growth factor (IGF)-binding protein (IGFBP) family. IGFBPs bind IGFs with high affinity, and regulate IGF availability in body fluids and tissues and modulate IGF binding to its receptors. This protein binds IGF-I and IGF-II with relatively low affinity, and belongs to a subfamily of low-affinity IGFBPs. It also stimulates prostacyclin production and cell adhesion. Alternatively spliced transcript variants encoding different isoforms have been described for this gene, and one variant has been associated with retinal arterial macroaneurysm (PMID:21835307). 3490 IGFBP7 insulin like growth factor binding protein 7 ENSG00000163453 NA
Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene acts as a monomer, is induced by retinoic acid, and appears to be involved in apoptosis. Finally, the encoded protein is the autoantigen implicated in celiac disease. Two transcript variants encoding different isoforms have been found for this gene. 7052 TGM2 transglutaminase 2 ENSG00000198959 NA
This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. 165 AEBP1 AE binding protein 1 ENSG00000106624 NA
This gene encodes a protein with an N-terminal half that contains cysteine/histidine motifs and leucine zipper-like repeats, and the C-terminal half is rich in arginine and glutamate residues (RE domain) and arginine and serine residues (RS domain). This protein localizes with a speckled pattern in the nucleus, and could be involved in the formation of splicesome via the RE and RS domains. Two alternatively spliced transcript variants encoding the same protein have been found for this gene. 51747 LUC7L3 LUC7 like 3 pre-mRNA splicing factor ENSG00000108848 NA
NA 23524 SRRM2 serine/arginine repetitive matrix 2 ENSG00000167978 NA
The leiomodin 1 protein has a putative membrane-spanning region and 2 types of tandemly repeated blocks. The transcript is expressed in all tissues tested, with the highest levels in thyroid, eye muscle, skeletal muscle, and ovary. Increased expression of leiomodin 1 may be linked to Graves’ disease and thyroid-associated ophthalmopathy. 25802 LMOD1 leiomodin 1 ENSG00000163431 NA
NA 116983 ACAP3 ArfGAP with coiled-coil, ankyrin repeat and PH domains 3 ENSG00000131584 NA
This gene encodes a transcription factor involved in the induction of genes regulated by oxygen, which is induced as oxygen levels fall. The encoded protein contains a basic-helix-loop-helix domain protein dimerization domain as well as a domain found in proteins in signal transduction pathways which respond to oxygen levels. Mutations in this gene are associated with erythrocytosis familial type 4. 2034 EPAS1 endothelial PAS domain protein 1 ENSG00000116016 NA
NA 6625 SNRNP70 small nuclear ribonucleoprotein U1 subunit 70 ENSG00000104852 NA
The product of this gene belongs to the actin-binding proteins ADF family. This family of proteins is responsible for enhancing the turnover rate of actin in vivo. This gene encodes the actin depolymerizing protein that severs actin filaments (F-actin) and binds to actin monomers (G-actin). Two transcript variants encoding distinct isoforms have been identified for this gene. 11034 DSTN destrin, actin depolymerizing factor ENSG00000125868 NA
This gene encodes a member of a subfamily of LIM domain proteins that are characterized by an N-terminal proline-rich region and three C-terminal LIM domains. The encoded protein localizes to the cell periphery in focal adhesions and may be involved in cell-cell adhesion and cell motility. This protein also shuttles through the nucleus and may function as a transcriptional co-activator. This gene is located at the junction of certain disease-related chromosomal translocations, which result in the expression of chimeric proteins that may promote tumor growth. Alternative splicing results in multiple transcript variants. 4026 LPP LIM domain containing preferred translocation partner in lipoma ENSG00000145012 NA
Alpha-2-macroglobulin is a protease inhibitor and cytokine transporter. It inhibits many proteases, including trypsin, thrombin and collagenase. A2M is implicated in Alzheimer disease (AD) due to its ability to mediate the clearance and degradation of A-beta, the major component of beta-amyloid deposits. 2 A2M alpha-2-macroglobulin ENSG00000175899 NA
This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. 3983 ABLIM1 actin binding LIM protein 1 ENSG00000099204 NA
This gene encodes a member of the polycystin protein family. The encoded glycoprotein contains a large N-terminal extracellular region, multiple transmembrane domains and a cytoplasmic C-tail. It is an integral membrane protein that functions as a regulator of calcium permeable cation channels and intracellular calcium homoeostasis. It is also involved in cell-cell/matrix interactions and may modulate G-protein-coupled signal-transduction pathways. It plays a role in renal tubular development, and mutations in this gene cause autosomal dominant polycystic kidney disease type 1 (ADPKD1). ADPKD1 is characterized by the growth of fluid-filled cysts that replace normal renal tissue and result in end-stage renal failure. Splice variants encoding different isoforms have been noted for this gene. Also, six pseudogenes, closely linked in a known duplicated region on chromosome 16p, have been described. 5310 PKD1 polycystin 1, transient receptor potential channel interacting ENSG00000008710 NA
This gene encodes a type IV collagen alpha protein. Type IV collagen proteins are integral components of basement membranes. This gene shares a bidirectional promoter with a paralogous gene on the opposite strand. The protein consists of an amino-terminal 7S domain, a triple-helix forming collagenous domain, and a carboxy-terminal non-collagenous domain. It functions as part of a heterotrimer and interacts with other extracellular matrix components such as perlecans, proteoglycans, and laminins. In addition, proteolytic cleavage of the non-collagenous carboxy-terminal domain results in a biologically active fragment known as arresten, which has anti-angiogenic and tumor suppressor properties. Mutations in this gene cause porencephaly, cerebrovascular disease, and renal and muscular defects. Alternative splicing results in multiple transcript variants. 1282 COL4A1 collagen type IV alpha 1 chain ENSG00000187498 NA
NA 388 RHOB ras homolog family member B ENSG00000143878 NA
This gene encodes a cytoskeletal protein that is concentrated in areas of cell-substratum and cell-cell contacts. The encoded protein plays a significant role in the assembly of actin filaments and in spreading and migration of various cell types, including fibroblasts and osteoclasts. It codistributes with integrins in the cell surface membrane in order to assist in the attachment of adherent cells to extracellular matrices and of lymphocytes to other cells. The N-terminus of this protein contains elements for localization to cell-extracellular matrix junctions. The C-terminus contains binding sites for proteins such as beta-1-integrin, actin, and vinculin. 7094 TLN1 talin 1 ENSG00000137076 NA
This gene encodes a member of the fibrillar collagen family, and plays a role during the calcification of cartilage and the transition of cartilage to bone. The encoded protein product is a preproprotein. It includes an N-terminal signal peptide, which is followed by an N-terminal propetide, mature peptide and a C-terminal propeptide. The N-terminal propeptide contains thrombospondin N-terminal-like and laminin G-like domains. The mature peptide is a major triple-helical region. The C-terminal propeptide, also known as COLFI domain, plays crucial roles in tissue growth and repair. Mutations in this gene cause Steel syndrome. Alternatively spliced transcript variants have been found, but the full-length nature of some variants has not been determined. 85301 COL27A1 collagen type XXVII alpha 1 ENSG00000196739 NA
This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. The C-terminal portion of the protein, known as canstatin, is an inhibitor of angiogenesis and tumor growth. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. 1284 COL4A2 collagen type IV alpha 2 ENSG00000134871 NA
Growth arrest-specific 7 is expressed primarily in terminally differentiated brain cells and predominantly in mature cerebellar Purkinje neurons. GAS7 plays a putative role in neuronal development. Several transcript variants encoding proteins which vary in the N-terminus have been described. 8522 GAS7 growth arrest specific 7 ENSG00000007237 NA
The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. 6876 TAGLN transgelin ENSG00000149591 NA
This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 7169 TPM2 tropomyosin 2 (beta) ENSG00000198467 NA
This gene encodes a cytoskeletal protein that is required for organizing the actin cytoskeleton. The protein is a component of actin-containing microfilaments, and it is involved in the control of cell shape, adhesion, and contraction. Polymorphisms in this gene are associated with a susceptibility to pancreatic cancer type 1, and also with a risk for myocardial infarction. Alternative splicing results in multiple transcript variants. 23022 PALLD palladin, cytoskeletal associated protein ENSG00000129116 NA
The Golgi apparatus, which participates in glycosylation and transport of proteins and lipids in the secretory pathway, consists of a series of stacked, flattened membrane sacs referred to as cisternae. Interactions between the Golgi and microtubules are thought to be important for the reorganization of the Golgi after it fragments during mitosis. The golgins constitute a family of proteins which are localized to the Golgi. This gene encodes a golgin which structurally resembles its family member GOLGA2, suggesting that they may share a similar function. There are many similar copies of this gene on chromosome 15. Alternative splicing results in multiple transcript variants. 23015 GOLGA8A golgin A8 family member A ENSG00000175265 NA
The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intracellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the ABC1 subfamily. Members of the ABC1 subfamily comprise the only major ABC subfamily found exclusively in multicellular eukaryotes. This protein is highly expressed in brain tissue and may play a role in macrophage lipid metabolism and neural development. Two transcript variants encoding different isoforms have been found for this gene. 20 ABCA2 ATP binding cassette subfamily A member 2 ENSG00000107331 NA
The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. 4155 MBP myelin basic protein ENSG00000197971 NA
This gene encodes one of the three enolase isoenzymes found in mammals. This isoenzyme, a homodimer, is found in mature neurons and cells of neuronal origin. A switch from alpha enolase to gamma enolase occurs in neural tissue during development in rats and primates. 2026 ENO2 enolase 2 ENSG00000111674 NA
The protein encoded by this gene is a member of the formin-binding-protein family. The protein contains an N-terminal Fer/Cdc42-interacting protein 4 (CIP4) homology (FCH) domain followed by a coiled-coil domain, a proline-rich motif, a second coiled-coil domain, a Rho family protein-binding domain (RBD), and a C-terminal SH3 domain. This protein binds sorting nexin 2 (SNX2), tankyrase (TNKS), and dynamin; an interaction between this protein and formin has not been demonstrated yet in human. 23048 FNBP1 formin binding protein 1 ENSG00000187239 NA
NA 140710 SOGA1 suppressor of glucose, autophagy associated 1 ENSG00000149639 NA
PPFIA4, or liprin-alpha-4, belongs to the liprin-alpha gene family. See liprin-alpha-1 (LIP1, or PPFIA1; MIM 611054) for background on liprins. 8497 PPFIA4 PTPRF interacting protein alpha 4 ENSG00000143847 NA
The protein encoded by this gene is a member of the serine/arginine (SR)-rich family of pre-mRNA splicing factors, which constitute part of the spliceosome. Each of these factors contains an RNA recognition motif (RRM) for binding RNA and an RS domain for binding other proteins. The RS domain is rich in serine and arginine residues and facilitates interaction between different SR splicing factors. In addition to being critical for mRNA splicing, the SR proteins have also been shown to be involved in mRNA export from the nucleus and in translation. Alternative splicing results in multiple transcript variants. 6430 SRSF5 serine and arginine rich splicing factor 5 ENSG00000100650 NA
This gene encodes a CBL-associated protein which functions in the signaling and stimulation of insulin. Mutations in this gene may be associated with human disorders of insulin resistance. Alternative splicing results in multiple transcript variants. 10580 SORBS1 sorbin and SH3 domain containing 1 ENSG00000095637 NA
NA 7089 TLE2 transducin like enhancer of split 2 ENSG00000065717 NA
NA 27129 HSPB7 heat shock protein family B (small) member 7 ENSG00000173641 NA
NA NA NA NA ENSG00000256309 TRUE
Chloride channels are a diverse group of proteins that regulate fundamental cellular processes including stabilization of cell membrane potential, transepithelial transport, maintenance of intracellular pH, and regulation of cell volume. Chloride intracellular channel 4 (CLIC4) protein, encoded by the CLIC4 gene, is a member of the p64 family; the gene is expressed in many tissues and exhibits a intracellular vesicular pattern in Panc-1 cells (pancreatic cancer cells). 25932 CLIC4 chloride intracellular channel 4 ENSG00000169504 NA
A human melanoma-associated chondroitin sulfate proteoglycan plays a role in stabilizing cell-substratum interactions during early events of melanoma cell spreading on endothelial basement membranes. CSPG4 represents an integral membrane chondroitin sulfate proteoglycan expressed by human malignant melanoma cells. 1464 CSPG4 chondroitin sulfate proteoglycan 4 ENSG00000173546 NA
Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, cytoskeletal, alpha actinin isoform and maps to the same site as the structurally similar erythroid beta spectrin gene. Three transcript variants encoding different isoforms have been found for this gene. 87 ACTN1 actinin alpha 1 ENSG00000072110 NA
This gene encodes a nonadrenergic imidazoline-1 receptor protein that localizes to the cytosol and anchors to the inner layer of the plasma membrane. The orthologous mouse protein has been shown to influence cytoskeletal organization and cell migration by binding to alpha-5-beta-1 integrin. In humans, this protein has been shown to bind to the adapter insulin receptor substrate 4 (IRS4) to mediate translocation of alpha-5 integrin from the cell membrane to endosomes. Expression of this protein was reduced in human breast cancers while its overexpression reduced tumor growth and metastasis; possibly by limiting the expression of alpha-5 integrin. In human cardiac tissue, this gene was found to affect cell growth and death while in neural tissue it affected neuronal growth and differentiation. Alternative splicing results in multiple transcript variants encoding differerent isoforms. Some isoforms lack the expected C-terminal domains of a functional imidazoline receptor. 11188 NISCH nischarin ENSG00000010322 NA
NA 25957 PNISR PNN interacting serine and arginine rich protein ENSG00000132424 NA
The product of this gene belongs to the Serine/Threonine protein kinase family, and to the Ca(2+)/calmodulin-dependent protein kinase subfamily. The major isoform of this gene plays a role in the calcium/calmodulin-dependent (CaM) kinase cascade by phosphorylating the downstream kinases CaMK1 and CaMK4. Protein products of this gene also phosphorylate AMP-activated protein kinase (AMPK). This gene has its strongest expression in the brain and influences signalling cascades involved with learning and memory, neuronal differentiation and migration, neurite outgrowth, and synapse formation. Alternative splicing results in multiple transcript variants encoding distinct isoforms. The identified isoforms differ in their ability to undergo autophosphorylation and to phosphorylate downstream kinases. 10645 CAMKK2 calcium/calmodulin-dependent protein kinase kinase 2 ENSG00000110931 NA
This gene encodes a member of the EPS8 gene family. The encoded protein, like other members of the family, is thought to link growth factor stimulation to actin organization, generating functional redundancy in the pathways that regulate actin cytoskeletal remodeling. 64787 EPS8L2 EPS8 like 2 ENSG00000177106 NA
The protein encoded by this gene belongs to the cyclin family. Through its interaction with several proteins, such as RNA polymerase II, splicing factors, and cyclin-dependent kinases, this protein functions as a regulator of the pre-mRNA splicing process, as well as in inducing apoptosis by modulating the expression of apoptotic and antiapoptotic proteins. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. 81669 CCNL2 cyclin L2 ENSG00000221978 NA
Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. 7038 TG thyroglobulin ENSG00000042832 NA
The product of this gene belongs to the integrin alpha chain family. Integrins are heterodimeric integral membrane proteins composed of an alpha subunit and a beta subunit that function in cell surface adhesion and signaling. The encoded preproprotein is proteolytically processed to generate light and heavy chains that comprise the alpha 5 subunit. This subunit associates with the beta 1 subunit to form a fibronectin receptor. This integrin may promote tumor invasion, and higher expression of this gene may be correlated with shorter survival time in lung cancer patients. Note that the integrin alpha 5 and integrin alpha V subunits are encoded by distinct genes. 3678 ITGA5 integrin subunit alpha 5 ENSG00000161638 NA
The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and kininogens. This gene encodes a stefin that functions as an intracellular thiol protease inhibitor. The protein is able to form a dimer stabilized by noncovalent forces, inhibiting papain and cathepsins l, h and b. The protein is thought to play a role in protecting against the proteases leaking from lysosomes. Evidence indicates that mutations in this gene are responsible for the primary defects in patients with progressive myoclonic epilepsy (EPM1). 1476 CSTB cystatin B ENSG00000160213 NA
This gene encodes an integral membrane protein associated with presynaptic vesicles in neuronal cells. The exact function of this protein is unclear, but studies of a similar murine protein suggest that it functions in synaptic plasticity without being required for synaptic transmission. The gene product belongs to the synaptogyrin gene family. Three alternatively spliced variants encoding three different isoforms have been identified. 9145 SYNGR1 synaptogyrin 1 ENSG00000100321 NA
NA NA NA NA ENSG00000163486 TRUE
The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. 4629 MYH11 myosin, heavy chain 11, smooth muscle ENSG00000133392 NA
PLCH2 is a member of the PLC-eta family of the phosphoinositide-specific phospholipase C (PLC) superfamily of enzymes that cleave PtdIns(4,5) P2 to generate second messengers inositol 1,4,5-trisphosphate and diacylglycerol (Zhou et al., 2005 [PubMed 16107206]). 9651 PLCH2 phospholipase C eta 2 ENSG00000149527 NA
The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. 3043 HBB hemoglobin subunit beta ENSG00000244734 NA
NA 9315 NREP neuronal regeneration related protein ENSG00000134986 NA
Guanine nucleotide dissociation stimulators (GDSs, or exchange factors), such as RALGDS, are effectors of Ras-related GTPases (see MIM 190020) that participate in signaling for a variety of cellular processes. 5900 RALGDS ral guanine nucleotide dissociation stimulator ENSG00000160271 NA
Integrins are heterodimeric proteins made up of alpha and beta subunits. At least 18 alpha and 8 beta subunits have been described in mammals. Integrin family members are membrane receptors involved in cell adhesion and recognition in a variety of processes including embryogenesis, hemostasis, tissue repair, immune response and metastatic diffusion of tumor cells. This gene encodes a beta subunit. Multiple alternatively spliced transcript variants which encode different protein isoforms have been found for this gene. 3688 ITGB1 integrin subunit beta 1 ENSG00000150093 NA
Synaptopodin is an actin-associated protein that may play a role in actin-based cell shape and motility. The name synaptopodin derives from the protein’s associations with postsynaptic densities and dendritic spines and with renal podocytes (Mundel et al., 1997 [PubMed 9314539]). 11346 SYNPO synaptopodin ENSG00000171992 NA
This gene encodes a member of the serine proteinase inhibitor (serpin) superfamily. This member is the principal inhibitor of tissue plasminogen activator (tPA) and urokinase (uPA), and hence is an inhibitor of fibrinolysis. Defects in this gene are the cause of plasminogen activator inhibitor-1 deficiency (PAI-1 deficiency), and high concentrations of the gene product are associated with thrombophilia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 5054 SERPINE1 serpin family E member 1 ENSG00000106366 NA
NA 1153 CIRBP cold inducible RNA binding protein ENSG00000099622 NA
The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The catalytic subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes an alpha 1 subunit. Multiple transcript variants encoding different isoforms have been found for this gene. 476 ATP1A1 ATPase Na+/K+ transporting subunit alpha 1 ENSG00000163399 NA
This gene encodes a member of the semicarbazide-sensitive amine oxidase family. Copper amine oxidases catalyze the oxidative conversion of amines to aldehydes in the presence of copper and quinone cofactor. The encoded protein is localized to the cell surface, has adhesive properties as well as monoamine oxidase activity, and may be involved in leukocyte trafficking. Alterations in levels of the encoded protein may be associated with many diseases, including diabetes mellitus. A pseudogene of this gene has been described and is located approximately 9-kb downstream on the same chromosome. Alternative splicing results in multiple transcript variants. 8639 AOC3 amine oxidase, copper containing 3 ENSG00000131471 NA
The protein encoded by this gene is a member of the immunophilin protein family, which play a role in immunoregulation and basic cellular processes involving protein folding and trafficking. This encoded protein is a cis-trans prolyl isomerase that binds to the immunosuppressants FK506 and rapamycin. It is thought to mediate calcineurin inhibition. It also interacts functionally with mature hetero-oligomeric progesterone receptor complexes along with the 90 kDa heat shock protein and P23 protein. This gene has been found to have multiple polyadenylation sites. Alternative splicing results in multiple transcript variants. 2289 FKBP5 FK506 binding protein 5 ENSG00000096060 NA
NA 7074 TIAM1 T-cell lymphoma invasion and metastasis 1 ENSG00000156299 NA
This gene encodes a member of the hook-related protein family. Members of this family are characterized by an N-terminal potential microtubule binding domain, a central coiled-coiled and a C-terminal Hook-related domain. The encoded protein may be involved in linking organelles to microtubules. 283234 CCDC88B coiled-coil domain containing 88B ENSG00000168071 NA
NA 57185 NIPAL3 NIPA like domain containing 3 ENSG00000001461 NA
This gene encodes a member of the ADAMTS (a disintegrin and metalloproteinase with thrombospondin motif) protein family. Members of the family share several distinct protein modules, including a propeptide region, a metalloproteinase domain, a disintegrin-like domain, and a thrombospondin type 1 (TS) motif. Individual members of this family differ in the number of C-terminal TS motifs, and some have unique C-terminal domains. The protein encoded by this gene contains two disintegrin loops and three C-terminal TS motifs and has anti-angiogenic activity. The expression of this gene may be associated with various inflammatory processes as well as development of cancer cachexia. This gene is likely to be necessary for normal growth, fertility, and organ morphology and function. 9510 ADAMTS1 ADAM metallopeptidase with thrombospondin type 1 motif 1 ENSG00000154734 NA
Members of the B class of plexins, such as PLXNB2 are transmembrane receptors that participate in axon guidance and cell migration in response to semaphorins (Perrot et al. (2002) [PubMed 12183458]). 23654 PLXNB2 plexin B2 ENSG00000196576 NA
This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. 2878 GPX3 glutathione peroxidase 3 ENSG00000211445 NA
NA 283450 HECTD4 HECT domain E3 ubiquitin protein ligase 4 ENSG00000173064 NA
This gene is a member of the PDK/BCKDK protein kinase family and encodes a mitochondrial protein with a histidine kinase domain. This protein is located in the matrix of the mitrochondria and inhibits the pyruvate dehydrogenase complex by phosphorylating one of its subunits, thereby contributing to the regulation of glucose metabolism. Expression of this gene is regulated by glucocorticoids, retinoic acid and insulin. 5166 PDK4 pyruvate dehydrogenase kinase 4 ENSG00000004799 NA
This gene encodes the alpha chain of type VII collagen. The type VII collagen fibril, composed of three identical alpha collagen chains, is restricted to the basement zone beneath stratified squamous epithelia. It functions as an anchoring fibril between the external epithelia and the underlying stroma. Mutations in this gene are associated with all forms of dystrophic epidermolysis bullosa. In the absence of mutations, however, an acquired form of this disease can result from an autoimmune response made to type VII collagen. 1294 COL7A1 collagen type VII alpha 1 ENSG00000114270 NA
This gene encodes a transmembrane protein containing a proline-rich domain in its N-terminal half. Studies in mice suggest that it is predominantly expressed in brain and spinal cord in embryonic and postnatal stages. Mutations in this gene are associated with episodic kinesigenic dyskinesia-1. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 112476 PRRT2 proline rich transmembrane protein 2 ENSG00000167371 NA
NA 266727 MDGA1 MAM domain containing glycosylphosphatidylinositol anchor 1 ENSG00000112139 NA
The protein encoded by this gene is a mitogen that is secreted by vascular endothelial cells. The encoded protein plays a role in chondrocyte proliferation and differentiation, cell adhesion in many cell types, and is related to platelet-derived growth factor. Certain polymorphisms in this gene have been linked with a higher incidence of systemic sclerosis. 1490 CTGF connective tissue growth factor ENSG00000118523 NA
This gene encodes an extracellular matrix protein with a spatially and temporally restricted tissue distribution. This protein is homohexameric with disulfide-linked subunits, and contains multiple EGF-like and fibronectin type-III domains. It is implicated in guidance of migrating neurons as well as axons during development, synaptic plasticity, and neuronal regeneration. 3371 TNC tenascin C ENSG00000041982 NA
This gene encodes a guanine nucleotide exchange factor that interacts specifically with the GTP-bound Rac1 and plays a role in the Rho/Rac signaling pathways. A variant in this gene was associated with osteoarthritis. Alternative splicing results in multiple transcript variants. 23263 MCF2L MCF.2 cell line derived transforming sequence like ENSG00000126217 NA
This gene encodes a calmodulin- and actin-binding protein that plays an essential role in the regulation of smooth muscle and nonmuscle contraction. The conserved domain of this protein possesses the binding activities to Ca(2+)-calmodulin, actin, tropomyosin, myosin, and phospholipids. This protein is a potent inhibitor of the actin-tropomyosin activated myosin MgATPase, and serves as a mediating factor for Ca(2+)-dependent inhibition of smooth muscle contraction. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. 800 CALD1 caldesmon 1 ENSG00000122786 NA
The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains and are clustered in a region on chromosome 17q21.2. 3866 KRT15 keratin 15 ENSG00000171346 NA
This gene is a member of the TIMP gene family. The proteins encoded by this gene family are natural inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. In addition to an inhibitory role against metalloproteinases, the encoded protein has a unique role among TIMP family members in its ability to directly suppress the proliferation of endothelial cells. As a result, the encoded protein may be critical to the maintenance of tissue homeostasis by suppressing the proliferation of quiescent tissues in response to angiogenic factors, and by inhibiting protease activity in tissues undergoing remodelling of the extracellular matrix. 7077 TIMP2 TIMP metallopeptidase inhibitor 2 ENSG00000035862 NA
NA 4162 MCAM melanoma cell adhesion molecule ENSG00000076706 NA
This gene encodes a protein that contains several helicase family domains. Mutations in this gene have been found in some patients with the CHARGE syndrome. Two transcript variants encoding different isoforms have been found for this gene. 55636 CHD7 chromodomain helicase DNA binding protein 7 ENSG00000171316 NA
This gene encodes a protein that activates the nuclear factor kappa B (NFKB1) signaling pathway. Mutations in this gene are associated with autosomal recessive distal spinal muscular atrophy. Multiple transcript variants encoding different isoforms have been found for this gene. 57449 PLEKHG5 pleckstrin homology and RhoGEF domain containing G5 ENSG00000171680 NA
NA 130733 TMEM178A transmembrane protein 178A ENSG00000152154 NA
The protein encoded by this gene shares similarity with the product of Drosophila syd gene, required for the functional interaction of kinesin I with axonal cargo. Studies of the similar gene in mouse suggested that this protein may interact with, and regulate the activity of numerous protein kinases of the JNK signaling pathway, and thus function as a scaffold protein in neuronal cells. The C. elegans counterpart of this gene is found to regulate synaptic vesicle transport possibly by integrating JNK signaling and kinesin-1 transport. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. 23162 MAPK8IP3 mitogen-activated protein kinase 8 interacting protein 3 ENSG00000138834 NA
The protein encoded by this gene belongs to the perilipin family, members of which coat intracellular lipid storage droplets. This protein is associated with the lipid globule surface membrane material, and maybe involved in development and maintenance of adipose tissue. However, it is not restricted to adipocytes as previously thought, but is found in a wide range of cultured cell lines, including fibroblasts, endothelial and epithelial cells, and tissues, such as lactating mammary gland, adrenal cortex, Sertoli and Leydig cells, and hepatocytes in alcoholic liver cirrhosis, suggesting that it may serve as a marker of lipid accumulation in diverse cell types and diseases. Alternatively spliced transcript variants have been found for this gene. 123 PLIN2 perilipin 2 ENSG00000147872 NA
The protein encoded by this gene is a secreted, extracellular matrix protein containing an Arg-Gly-Asp (RGD) motif and calcium-binding EGF-like domains. It promotes adhesion of endothelial cells through interaction of integrins and the RGD motif. It is prominently expressed in developing arteries but less so in adult vessels. However, its expression is reinduced in balloon-injured vessels and atherosclerotic lesions, notably in intimal vascular smooth muscle cells and endothelial cells. Therefore, the protein encoded by this gene may play a role in vascular development and remodeling. Defects in this gene are a cause of autosomal dominant cutis laxa, autosomal recessive cutis laxa type I (CL type I), and age-related macular degeneration type 3 (ARMD3). 10516 FBLN5 fibulin 5 ENSG00000140092 NA
Vinculin is a cytoskeletal protein associated with cell-cell and cell-matrix junctions, where it is thought to function as one of several interacting proteins involved in anchoring F-actin to the membrane. Defects in VCL are the cause of cardiomyopathy dilated type 1W. Dilated cardiomyopathy is a disorder characterized by ventricular dilation and impaired systolic function, resulting in congestive heart failure and arrhythmia. Multiple alternatively spliced transcript variants have been found for this gene, but the biological validity of some variants has not been determined. 7414 VCL vinculin ENSG00000035403 NA
This locus encodes a heat shock protein. The encoded protein likely plays a role in smooth muscle relaxation. 126393 HSPB6 heat shock protein family B (small) member 6 ENSG00000004776 NA
This gene encodes a syntaxin-binding protein. The encoded protein appears to play a role in release of neurotransmitters via regulation of syntaxin, a transmembrane attachment protein receptor. Mutations in this gene have been associated with infantile epileptic encephalopathy-4. Alternatively spliced transcript variants have been described. 6812 STXBP1 syntaxin binding protein 1 ENSG00000136854 NA
Members of the perilipin family, such as PLIN4, coat intracellular lipid storage droplets (Wolins et al., 2003 [PubMed 12840023]). 729359 PLIN4 perilipin 4 ENSG00000167676 NA
This gene encodes a C2H2-type zinc finger protein which acts a transcriptional repressor of genes involved in neuronal development. The encoded protein recognizes a specific sequence motif and recruits components of chromatin to target genes. Alternative splicing results in multiple transcript variants. 10472 ZBTB18 zinc finger and BTB domain containing 18 ENSG00000179456 NA
This gene encodes a protein that plays a role in desmosome assembly, cell adhesion, cytoskeletal organization, and epidermal differentiation. This protein co-localizes with desmoplakin and the cytolinker protein periplakin. In general, this protein localizes to the nucleus, desmosomes, cell membrane, and cortical actin-based structures. Some isoforms of this protein also associate with microtubules. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Additional splice variants have been described but their biological validity has not been verified. 23254 KAZN kazrin, periplakin interacting protein ENSG00000189337 NA
NA 100507347 VIM-AS1 VIM antisense RNA 1 ENSG00000229124 NA
NA 25989 ULK3 unc-51 like kinase 3 ENSG00000140474 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",17,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 18 Annotations

out <- mygene::queryMany(gene_list[18,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id summary name symbol query notfound
2512 This gene encodes the light subunit of the ferritin protein. Ferritin is the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in this light chain ferritin gene are associated with several neurodegenerative diseases and hyperferritinemia-cataract syndrome. This gene has multiple pseudogenes. ferritin, light polypeptide FTL ENSG00000087086 NA
3860 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. keratin 13 KRT13 ENSG00000171401 NA
8490 This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. regulator of G-protein signaling 5 RGS5 ENSG00000143248 NA
2878 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. glutathione peroxidase 3 GPX3 ENSG00000211445 NA
NA NA NA NA ENSG00000117289 TRUE
2495 This gene encodes the heavy subunit of ferritin, the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in ferritin proteins are associated with several neurodegenerative diseases. This gene has multiple pseudogenes. Several alternatively spliced transcript variants have been observed, but their biological validity has not been determined. ferritin heavy chain 1 FTH1 ENSG00000167996 NA
59 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. actin, alpha 2, smooth muscle, aorta ACTA2 ENSG00000107796 NA
3851 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 4 KRT4 ENSG00000170477 NA
5310 This gene encodes a member of the polycystin protein family. The encoded glycoprotein contains a large N-terminal extracellular region, multiple transmembrane domains and a cytoplasmic C-tail. It is an integral membrane protein that functions as a regulator of calcium permeable cation channels and intracellular calcium homoeostasis. It is also involved in cell-cell/matrix interactions and may modulate G-protein-coupled signal-transduction pathways. It plays a role in renal tubular development, and mutations in this gene cause autosomal dominant polycystic kidney disease type 1 (ADPKD1). ADPKD1 is characterized by the growth of fluid-filled cysts that replace normal renal tissue and result in end-stage renal failure. Splice variants encoding different isoforms have been noted for this gene. Also, six pseudogenes, closely linked in a known duplicated region on chromosome 16p, have been described. polycystin 1, transient receptor potential channel interacting PKD1 ENSG00000008710 NA
567 This gene encodes a serum protein found in association with the major histocompatibility complex (MHC) class I heavy chain on the surface of nearly all nucleated cells. The protein has a predominantly beta-pleated sheet structure that can form amyloid fibrils in some pathological conditions. The encoded antimicrobial protein displays antibacterial activity in amniotic fluid. A mutation in this gene has been shown to result in hypercatabolic hypoproteinemia. beta-2-microglobulin B2M ENSG00000166710 NA
7169 This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. tropomyosin 2 (beta) TPM2 ENSG00000198467 NA
4162 NA melanoma cell adhesion molecule MCAM ENSG00000076706 NA
3858 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. keratin 10 KRT10 ENSG00000186395 NA
6707 NA small proline rich protein 3 SPRR3 ENSG00000163209 NA
8516 Integrins are heterodimeric transmembrane receptor proteins that mediate numerous cellular processes including cell adhesion, cytoskeletal rearrangement, and activation of cell signaling pathways. Integrins are composed of alpha and beta subunits. This gene encodes the alpha 8 subunit of the heterodimeric integrin alpha8beta1 protein. The encoded protein is a single-pass type 1 membrane protein that contains multiple FG-GAP repeats. This repeat is predicted to fold into a beta propeller structure. This gene regulates the recruitment of mesenchymal cells into epithelial structures, mediates cell-cell interactions, and regulates neurite outgrowth of sensory and motor neurons. The integrin alpha8beta1 protein thus plays an important role in wound-healing and organogenesis. Mutations in this gene have been associated with renal hypodysplasia/aplasia-1 (RHDA1) and with several animal models of chronic kidney disease. Alternate splicing results in multiple transcript variants encoding distinct isoforms. integrin subunit alpha 8 ITGA8 ENSG00000077943 NA
2006 This gene encodes a protein that is one of the two components of elastic fibers. The encoded protein is rich in hydrophobic amino acids such as glycine and proline, which form mobile hydrophobic regions bounded by crosslinks between lysine residues. Deletions and mutations in this gene are associated with supravalvular aortic stenosis (SVAS) and autosomal dominant cutis laxa. Multiple transcript variants encoding different isoforms have been found for this gene. elastin ELN ENSG00000049540 NA
4854 This gene encodes the third discovered human homologue of the Drosophilia melanogaster type I membrane protein notch. In Drosophilia, notch interaction with its cell-bound ligands (delta, serrate) establishes an intercellular signalling pathway that plays a key role in neural development. Homologues of the notch-ligands have also been identified in human, but precise interactions between these ligands and the human notch homologues remains to be determined. Mutations in NOTCH3 have been identified as the underlying cause of cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL). notch 3 NOTCH3 ENSG00000074181 NA
3912 Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. Several isoforms of each chain have been described. Different alpha, beta and gamma chain isomers combine to give rise to different heterotrimeric laminin isoforms which are designated by Arabic numerals in the order of their discovery, i.e. alpha1beta1gamma1 heterotrimer is laminin 1. The biological functions of the different chains and trimer molecules are largely unknown, but some of the chains have been shown to differ with respect to their tissue distribution, presumably reflecting diverse functions in vivo. This gene encodes the beta chain isoform laminin, beta 1. The beta 1 chain has 7 structurally distinct domains which it shares with other beta chain isomers. The C-terminal helical region containing domains I and II are separated by domain alpha, domains III and V contain several EGF-like repeats, and domains IV and VI have a globular conformation. Laminin, beta 1 is expressed in most tissues that produce basement membranes, and is one of the 3 chains constituting laminin 1, the first laminin isolated from Engelbreth-Holm-Swarm (EHS) tumor. A sequence in the beta 1 chain that is involved in cell attachment, chemotaxis, and binding to the laminin receptor was identified and shown to have the capacity to inhibit metastasis. laminin subunit beta 1 LAMB1 ENSG00000091136 NA
7057 The protein encoded by this gene is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein can bind to fibrinogen, fibronectin, laminin, type V collagen and integrins alpha-V/beta-1. This protein has been shown to play roles in platelet aggregation, angiogenesis, and tumorigenesis. thrombospondin 1 THBS1 ENSG00000137801 NA
3043 The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. hemoglobin subunit beta HBB ENSG00000244734 NA
3133 HLA-E belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. HLA-E binds a restricted subset of peptides derived from the leader peptides of other class I molecules. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domains, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. major histocompatibility complex, class I, E HLA-E ENSG00000204592 NA
ENSG00000229732 NA NA AC019349.5 ENSG00000229732 NA
55160 ARHGEF10L is a member of the RhoGEF family of guanine nucleotide exchange factors (GEFs) that activate Rho GTPases (Winkler et al., 2005 [PubMed 16112081]). Rho guanine nucleotide exchange factor 10 like ARHGEF10L ENSG00000074964 NA
6711 Spectrin is an actin crosslinking and molecular scaffold protein that links the plasma membrane to the actin cytoskeleton, and functions in the determination of cell shape, arrangement of transmembrane proteins, and organization of organelles. It is composed of two antiparallel dimers of alpha- and beta- subunits. This gene is one member of a family of beta-spectrin genes. The encoded protein contains an N-terminal actin-binding domain, and 17 spectrin repeats which are involved in dimer formation. Multiple transcript variants encoding different isoforms have been found for this gene. spectrin beta, non-erythrocytic 1 SPTBN1 ENSG00000115306 NA
23770 The protein encoded by this gene is a member of the immunophilin protein family, which play a role in immunoregulation and basic cellular processes involving protein folding and trafficking. Unlike the other members of the family, this encoded protein does not seem to have PPIase/rotamase activity. It may have a role in neurons associated with memory function. FK506 binding protein 8 FKBP8 ENSG00000105701 NA
4856 The protein encoded by this gene is a small secreted cysteine-rich protein and a member of the CCN family of regulatory proteins. CNN family proteins associate with the extracellular matrix and play an important role in cardiovascular and skeletal development, fibrosis and cancer development. nephroblastoma overexpressed NOV ENSG00000136999 NA
4629 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. myosin, heavy chain 11, smooth muscle MYH11 ENSG00000133392 NA
ENSG00000180139 NA ACTA2 antisense RNA 1 ACTA2-AS1 ENSG00000180139 NA
4069 This gene encodes human lysozyme, whose natural substrate is the bacterial cell wall peptidoglycan (cleaving the beta[1-4]glycosidic linkages between N-acetylmuramic acid and N-acetylglucosamine). Lysozyme is one of the antimicrobial agents found in human milk, and is also present in spleen, lung, kidney, white blood cells, plasma, saliva, and tears. The protein has antibacterial activity against a number of bacterial species. Missense mutations in this gene have been identified in heritable renal amyloidosis. lysozyme LYZ ENSG00000090382 NA
6556 This gene is a member of the solute carrier family 11 (proton-coupled divalent metal ion transporters) family and encodes a multi-pass membrane protein. The protein functions as a divalent transition metal (iron and manganese) transporter involved in iron metabolism and host resistance to certain pathogens. Mutations in this gene have been associated with susceptibility to infectious diseases such as tuberculosis and leprosy, and inflammatory diseases such as rheumatoid arthritis and Crohn disease. Alternatively spliced variants that encode different protein isoforms have been described but the full-length nature of only one has been determined. solute carrier family 11 member 1 SLC11A1 ENSG00000018280 NA
1634 This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. decorin DCN ENSG00000011465 NA
1293 This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. collagen type VI alpha 3 chain COL6A3 ENSG00000163359 NA
85301 This gene encodes a member of the fibrillar collagen family, and plays a role during the calcification of cartilage and the transition of cartilage to bone. The encoded protein product is a preproprotein. It includes an N-terminal signal peptide, which is followed by an N-terminal propetide, mature peptide and a C-terminal propeptide. The N-terminal propeptide contains thrombospondin N-terminal-like and laminin G-like domains. The mature peptide is a major triple-helical region. The C-terminal propeptide, also known as COLFI domain, plays crucial roles in tissue growth and repair. Mutations in this gene cause Steel syndrome. Alternatively spliced transcript variants have been found, but the full-length nature of some variants has not been determined. collagen type XXVII alpha 1 COL27A1 ENSG00000196739 NA
2194 The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. fatty acid synthase FASN ENSG00000169710 NA
60 This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. actin, beta ACTB ENSG00000075624 NA
5730 The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. prostaglandin D2 synthase PTGDS ENSG00000107317 NA
1476 The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and kininogens. This gene encodes a stefin that functions as an intracellular thiol protease inhibitor. The protein is able to form a dimer stabilized by noncovalent forces, inhibiting papain and cathepsins l, h and b. The protein is thought to play a role in protecting against the proteases leaking from lysosomes. Evidence indicates that mutations in this gene are responsible for the primary defects in patients with progressive myoclonic epilepsy (EPM1). cystatin B CSTB ENSG00000160213 NA
72 Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. actin, gamma 2, smooth muscle, enteric ACTG2 ENSG00000163017 NA
100129518 NA uncharacterized LOC100129518 LOC100129518 ENSG00000112096 NA
6648 This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 1. superoxide dismutase 2, mitochondrial SOD2 ENSG00000112096 NA
32 Acetyl-CoA carboxylase (ACC) is a complex multifunctional enzyme system. ACC is a biotin-containing enzyme which catalyzes the carboxylation of acetyl-CoA to malonyl-CoA, the rate-limiting step in fatty acid synthesis. ACC-beta is thought to control fatty acid oxidation by means of the ability of malonyl-CoA to inhibit carnitine-palmitoyl-CoA transferase I, the rate-limiting step in fatty acid uptake and oxidation by mitochondria. ACC-beta may be involved in the regulation of fatty acid oxidation, rather than fatty acid biosynthesis. There is evidence for the presence of two ACC-beta isoforms. acetyl-CoA carboxylase beta ACACB ENSG00000076555 NA
80781 This gene encodes the alpha chain of type XVIII collagen. This collagen is one of the multiplexins, extracellular matrix proteins that contain multiple triple-helix domains (collagenous domains) interrupted by non-collagenous domains. A long isoform of the protein has an N-terminal domain that is homologous to the extracellular part of frizzled receptors. Proteolytic processing at several endogenous cleavage sites in the C-terminal domain results in production of endostatin, a potent antiangiogenic protein that is able to inhibit angiogenesis and tumor growth. Mutations in this gene are associated with Knobloch syndrome. The main features of this syndrome involve retinal abnormalities, so type XVIII collagen may play an important role in retinal structure and in neural tube closure. Alternative splicing results in multiple transcript variants. collagen type XVIII alpha 1 chain COL18A1 ENSG00000182871 NA
2180 The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. Several transcript variants encoding different isoforms have been found for this gene. acyl-CoA synthetase long-chain family member 1 ACSL1 ENSG00000151726 NA
7078 This gene belongs to the TIMP gene family. The proteins encoded by this gene family are inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix (ECM). Expression of this gene is induced in response to mitogenic stimulation and this netrin domain-containing protein is localized to the ECM. Mutations in this gene have been associated with the autosomal dominant disorder Sorsby’s fundus dystrophy. TIMP metallopeptidase inhibitor 3 TIMP3 ENSG00000100234 NA
4638 This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. myosin light chain kinase MYLK ENSG00000065534 NA
8497 PPFIA4, or liprin-alpha-4, belongs to the liprin-alpha gene family. See liprin-alpha-1 (LIP1, or PPFIA1; MIM 611054) for background on liprins. PTPRF interacting protein alpha 4 PPFIA4 ENSG00000143847 NA
4256 The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. matrix Gla protein MGP ENSG00000111341 NA
4642 NA myosin ID MYO1D ENSG00000176658 NA
2876 This gene encodes a member of the glutathione peroxidase family. Glutathione peroxidase functions in the detoxification of hydrogen peroxide, and is one of the most important antioxidant enzymes in humans. This protein is one of only a few proteins known in higher vertebrates to contain selenocysteine, which occurs at the active site of glutathione peroxidase and is coded by UGA, that normally functions as a translation termination codon. In addition, this protein is characterized in a polyalanine sequence polymorphism in the N-terminal region, which includes three alleles with five, six or seven alanine (ALA) repeats in this sequence. The allele with five ALA repeats is significantly associated with breast cancer risk. Two alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. glutathione peroxidase 1 GPX1 ENSG00000233276 NA
4946 The protein encoded by this gene belongs to the ornithine decarboxylase antizyme family, which plays a role in cell growth and proliferation by regulating intracellular polyamine levels. Expression of antizymes requires +1 ribosomal frameshifting, which is enhanced by high levels of polyamines. Antizymes in turn bind to and inhibit ornithine decarboxylase (ODC), the key enzyme in polyamine biosynthesis; thus, completing the auto-regulatory circuit. This gene encodes antizyme 1, the first member of the antizyme family, that has broad tissue distribution, and negatively regulates intracellular polyamine levels by binding to and targeting ODC for degradation, as well as inhibiting polyamine uptake. Antizyme 1 mRNA contains two potential in-frame AUGs; and studies in rat suggest that alternative use of the two translation initiation sites results in N-terminally distinct protein isoforms with different subcellular localization. Alternatively spliced transcript variants have also been noted for this gene. ornithine decarboxylase antizyme 1 OAZ1 ENSG00000104904 NA
3315 The protein encoded by this gene is induced by environmental stress and developmental changes. The encoded protein is involved in stress resistance and actin organization and translocates from the cytoplasm to the nucleus upon stress induction. Defects in this gene are a cause of Charcot-Marie-Tooth disease type 2F (CMT2F) and distal hereditary motor neuropathy (dHMN). heat shock protein family B (small) member 1 HSPB1 ENSG00000106211 NA
9645 NA microtubule associated monooxygenase, calponin and LIM domain containing 2 MICAL2 ENSG00000133816 NA
290 Aminopeptidase N is located in the small-intestinal and renal microvillar membrane, and also in other plasma membranes. In the small intestine aminopeptidase N plays a role in the final digestion of peptides generated from hydrolysis of proteins by gastric and pancreatic proteases. Its function in proximal tubular epithelial cells and other cell types is less clear. The large extracellular carboxyterminal domain contains a pentapeptide consensus sequence characteristic of members of the zinc-binding metalloproteinase superfamily. Sequence comparisons with known enzymes of this class showed that CD13 and aminopeptidase N are identical. The latter enzyme was thought to be involved in the metabolism of regulatory peptides by diverse cell types, including small intestinal and renal tubular epithelial cells, macrophages, granulocytes, and synaptic membranes from the CNS. Human aminopeptidase N is a receptor for one strain of human coronavirus that is an important cause of upper respiratory tract infections. Defects in this gene appear to be a cause of various types of leukemia or lymphoma. alanyl aminopeptidase, membrane ANPEP ENSG00000166825 NA
266727 NA MAM domain containing glycosylphosphatidylinositol anchor 1 MDGA1 ENSG00000112139 NA
1893 This gene encodes a soluble protein that is involved in endochondral bone formation, angiogenesis, and tumor biology. It also interacts with a variety of extracellular and structural proteins, contributing to the maintenance of skin integrity and homeostasis. Mutations in this gene are associated with lipoid proteinosis disorder (also known as hyalinosis cutis et mucosae or Urbach-Wiethe disease) that is characterized by generalized thickening of skin, mucosae and certain viscera. Alternatively spliced transcript variants encoding distinct isoforms have been described for this gene. extracellular matrix protein 1 ECM1 ENSG00000143369 NA
182 The jagged 1 protein encoded by JAG1 is the human homolog of the Drosophilia jagged protein. Human jagged 1 is the ligand for the receptor notch 1, the latter a human homolog of the Drosophilia jagged receptor notch. Mutations that alter the jagged 1 protein cause Alagille syndrome. Jagged 1 signalling through notch 1 has also been shown to play a role in hematopoiesis. jagged 1 JAG1 ENSG00000101384 NA
140576 NA S100 calcium binding protein A16 S100A16 ENSG00000188643 NA
716 This gene encodes a serine protease, which is a major constituent of the human complement subcomponent C1. C1s associates with two other complement components C1r and C1q in order to yield the first component of the serum complement system. Defects in this gene are the cause of selective C1s deficiency. complement component 1, s subcomponent C1S ENSG00000182326 NA
165 This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. AE binding protein 1 AEBP1 ENSG00000106624 NA
6319 This gene encodes an enzyme involved in fatty acid biosynthesis, primarily the synthesis of oleic acid. The protein belongs to the fatty acid desaturase family and is an integral membrane protein located in the endoplasmic reticulum. Transcripts of approximately 3.9 and 5.2 kb, differing only by alternative polyadenlyation signals, have been detected. A gene encoding a similar enzyme is located on chromosome 4 and a pseudogene of this gene is located on chromosome 17. stearoyl-CoA desaturase SCD ENSG00000099194 NA
2487 The protein encoded by this gene is a secreted protein that is involved in the regulation of bone development. Defects in this gene are a cause of female-specific osteoarthritis (OA) susceptibility. frizzled-related protein FRZB ENSG00000162998 NA
7074 NA T-cell lymphoma invasion and metastasis 1 TIAM1 ENSG00000156299 NA
4131 This gene encodes a protein that belongs to the microtubule-associated protein family. The proteins of this family are thought to be involved in microtubule assembly, which is an essential step in neurogenesis. The product of this gene is a precursor polypeptide that presumably undergoes proteolytic processing to generate the final MAP1B heavy chain and LC1 light chain. Gene knockout studies of the mouse microtubule-associated protein 1B gene suggested an important role in development and function of the nervous system. microtubule associated protein 1B MAP1B ENSG00000131711 NA
81 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, alpha actinin isoform which is concentrated in the cytoplasm, and thought to be involved in metastatic processes. Mutations in this gene have been associated with focal and segmental glomerulosclerosis. actinin alpha 4 ACTN4 ENSG00000130402 NA
308 The protein encoded by this gene belongs to the annexin family of calcium-dependent phospholipid binding proteins some of which have been implicated in membrane-related events along exocytotic and endocytotic pathways. Annexin 5 is a phospholipase A2 and protein kinase C inhibitory protein with calcium channel activity and a potential role in cellular signal transduction, inflammation, growth and differentiation. Annexin 5 has also been described as placental anticoagulant protein I, vascular anticoagulant-alpha, endonexin II, lipocortin V, placental protein 4 and anchorin CII. The gene spans 29 kb containing 13 exons, and encodes a single transcript of approximately 1.6 kb and a protein product with a molecular weight of about 35 kDa. annexin A5 ANXA5 ENSG00000164111 NA
49860 This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation. cornulin CRNN ENSG00000143536 NA
6035 This gene encodes a member of the pancreatic-type of secretory ribonucleases, a subset of the ribonuclease A superfamily. The encoded endonuclease cleaves internal phosphodiester RNA bonds on the 3’-side of pyrimidine bases. It prefers poly(C) as a substrate and hydrolyzes 2’,3’-cyclic nucleotides, with a pH optimum near 8.0. The encoded protein is monomeric and more commonly acts to degrade ds-RNA over ss-RNA. Alternative splicing occurs at this locus and four transcript variants encoding the same protein have been identified. ribonuclease A family member 1, pancreatic RNASE1 ENSG00000129538 NA
928 This gene encodes a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Tetraspanins are cell surface glycoproteins with four transmembrane domains that form multimeric complexes with other cell surface proteins. The encoded protein functions in many cellular processes including differentiation, adhesion, and signal transduction, and expression of this gene plays a critical role in the suppression of cancer cell motility and metastasis. CD9 molecule CD9 ENSG00000010278 NA
1410 Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. crystallin alpha B CRYAB ENSG00000109846 NA
4023 LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. lipoprotein lipase LPL ENSG00000175445 NA
140710 NA suppressor of glucose, autophagy associated 1 SOGA1 ENSG00000149639 NA
5166 This gene is a member of the PDK/BCKDK protein kinase family and encodes a mitochondrial protein with a histidine kinase domain. This protein is located in the matrix of the mitrochondria and inhibits the pyruvate dehydrogenase complex by phosphorylating one of its subunits, thereby contributing to the regulation of glucose metabolism. Expression of this gene is regulated by glucocorticoids, retinoic acid and insulin. pyruvate dehydrogenase kinase 4 PDK4 ENSG00000004799 NA
4628 This gene encodes a member of the myosin superfamily. The protein represents a conventional non-muscle myosin; it should not be confused with the unconventional myosin-10 (MYO10). Myosins are actin-dependent motor proteins with diverse functions including regulation of cytokinesis, cell motility, and cell polarity. Mutations in this gene have been associated with May-Hegglin anomaly and developmental defects in brain and heart. Multiple transcript variants encoding different isoforms have been found for this gene. myosin, heavy chain 10, non-muscle MYH10 ENSG00000133026 NA
2670 This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. glial fibrillary acidic protein GFAP ENSG00000131095 NA
2995 Glycophorin C (GYPC) is an integral membrane glycoprotein. It is a minor species carried by human erythrocytes, but plays an important role in regulating the mechanical stability of red cells. A number of glycophorin C mutations have been described. The Gerbich and Yus phenotypes are due to deletion of exon 3 and 2, respectively. The Webb and Duch antigens, also known as glycophorin D, result from single point mutations of the glycophorin C gene. The glycophorin C protein has very little homology with glycophorins A and B. Alternate splicing results in multiple transcript variants. glycophorin C (Gerbich blood group) GYPC ENSG00000136732 NA
1675 This gene encodes a member of the S1, or chymotrypsin, family of serine peptidases. This protease catalyzes the cleavage of factor B, the rate-limiting step of the alternative pathway of complement activation. This protein also functions as an adipokine, a cell signaling protein secreted by adipocytes, which regulates insulin secretion in mice. Mutations in this gene underlie complement factor D deficiency, which is associated with recurrent bacterial meningitis infections in human patients. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate the mature protease. complement factor D CFD ENSG00000197766 NA
3487 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein binds both insulin-like growth factors (IGFs) I and II and circulates in the plasma in both glycosylated and non-glycosylated forms. Binding of this protein prolongs the half-life of the IGFs and alters their interaction with cell surface receptors. insulin like growth factor binding protein 4 IGFBP4 ENSG00000141753 NA
715 NA complement C1r subcomponent C1R ENSG00000159403 NA
752 This gene encodes a formin-related protein. Formin-related proteins have been implicated in morphogenesis, cytokinesis, and cell polarity. An alternative splice variant has been described but its full length sequence has not been determined. formin like 1 FMNL1 ENSG00000184922 NA
10398 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. myosin light chain 9 MYL9 ENSG00000101335 NA
5818 This gene encodes an adhesion protein that plays a role in the organization of adherens junctions and tight junctions in epithelial and endothelial cells. The protein is a calcium(2+)-independent cell-cell adhesion molecule that belongs to the immunoglobulin superfamily and has 3 extracellular immunoglobulin-like loops, a single transmembrane domain (in some isoforms), and a cytoplasmic region. This protein acts as a receptor for glycoprotein D (gD) of herpes simplex viruses 1 and 2 (HSV-1, HSV-2), and pseudorabies virus (PRV) and mediates viral entry into epithelial and neuronal cells. Mutations in this gene cause cleft lip and palate/ectodermal dysplasia 1 syndrome (CLPED1) as well as non-syndromic cleft lip with or without cleft palate (CL/P). Alternative splicing results in multiple transcript variants encoding proteins with distinct C-termini. nectin cell adhesion molecule 1 NECTIN1 ENSG00000110400 NA
22898 NA DENN domain containing 3 DENND3 ENSG00000105339 NA
5420 This gene encodes a member of the sialomucin protein family. The encoded protein was originally identified as an important component of glomerular podocytes. Podocytes are highly differentiated epithelial cells with interdigitating foot processes covering the outer aspect of the glomerular basement membrane. Other biological activities of the encoded protein include: binding in a membrane protein complex with Na+/H+ exchanger regulatory factor to intracellular cytoskeletal elements, playing a role in hematopoetic cell differentiation, and being expressed in vascular endothelium cells and binding to L-selectin. podocalyxin like PODXL ENSG00000128567 NA
3911 This gene encodes one of the vertebrate laminin alpha chains. Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. The protein encoded by this gene is the alpha-5 subunit of of laminin-10 (laminin-511), laminin-11 (laminin-521) and laminin-15 (laminin-523). laminin subunit alpha 5 LAMA5 ENSG00000130702 NA
NA NA NA NA ENSG00000259716 TRUE
3861 This gene encodes a member of the keratin family, the most diverse group of intermediate filaments. This gene product, a type I keratin, is usually found as a heterotetramer with two keratin 5 molecules, a type II keratin. Together they form the cytoskeleton of epithelial cells. Mutations in the genes for these keratins are associated with epidermolysis bullosa simplex. At least one pseudogene has been identified at 17p12-p11. keratin 14 KRT14 ENSG00000186847 NA
58 The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. actin, alpha 1, skeletal muscle ACTA1 ENSG00000143632 NA
1535 Cytochrome b is comprised of a light chain (alpha) and a heavy chain (beta). This gene encodes the light, alpha subunit which has been proposed as a primary component of the microbicidal oxidase system of phagocytes. Mutations in this gene are associated with autosomal recessive chronic granulomatous disease (CGD), that is characterized by the failure of activated phagocytes to generate superoxide, which is important for the microbicidal activity of these cells. cytochrome b-245 alpha chain CYBA ENSG00000051523 NA
64855 NA family with sequence similarity 129 member B FAM129B ENSG00000136830 NA
125 The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. alcohol dehydrogenase 1B (class I), beta polypeptide ADH1B ENSG00000196616 NA
158471 The protein encoded by this gene belongs to the B-cell CLL/lymphoma 2 and adenovirus E1B 19 kDa interacting family, whose members play roles in many cellular processes including apotosis, cell transformation, and synaptic function. Several functions for this protein have been demonstrated including suppression of Ras homolog family member A activity, which results in reduced stress fiber formation and suppression of oncogenic cellular transformation. A high molecular weight isoform of this protein has also been shown to colocalize with Adaptor protein complex 2, beta-Adaptin and endodermal markers, suggesting an involvement in post-endocytic trafficking. In prostate cancer cells, this gene acts as a tumor suppressor and its expression is regulated by prostate cancer antigen 3, a non-protein coding gene on the opposite DNA strand in an intron of this gene. Prostate cancer antigen 3 regulates levels of this gene through formation of a double-stranded RNA that undergoes adenosine deaminase actin on RNA-dependent adenosine-to-inosine RNA editing. Alternative splicing results in multiple transcript variants. prune homolog 2 PRUNE2 ENSG00000106772 NA
4878 The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. natriuretic peptide A NPPA ENSG00000175206 NA
6497 This gene encodes the nuclear protooncogene protein homolog of avian sarcoma viral (v-ski) oncogene. It functions as a repressor of TGF-beta signaling, and may play a role in neural tube development and muscle differentiation. SKI proto-oncogene SKI ENSG00000157933 NA
54751 This gene encodes a protein with an N-terminal filamin-binding domain, a central proline-rich domain, and, multiple C-terminal LIM domains. This protein localizes at cell junctions and may link cell adhesion structures to the actin cytoskeleton. This protein may be involved in the assembly and stabilization of actin-filaments and likely plays a role in modulating cell adhesion, cell morphology and cell motility. This protein also localizes to the nucleus and may affect cardiomyocyte differentiation after binding with the CSX/NKX2-5 transcription factor. Alternative splicing results in multiple transcript variants encoding different isoforms. filamin binding LIM protein 1 FBLIM1 ENSG00000162458 NA
4060 This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family that includes decorin, biglycan, fibromodulin, keratocan, epiphycan, and osteoglycin. In these bifunctional molecules, the protein moiety binds collagen fibrils and the highly charged hydrophilic glycosaminoglycans regulate interfibrillar spacings. Lumican is the major keratan sulfate proteoglycan of the cornea but is also distributed in interstitial collagenous matrices throughout the body. Lumican may regulate collagen fibril organization and circumferential growth, corneal transparency, and epithelial cell migration and tissue repair. lumican LUM ENSG00000139329 NA
27295 The protein encoded by this gene contains a PDZ domain and a LIM domain, indicating that it may be involved in cytoskeletal assembly. In support of this, the encoded protein has been shown to bind the spectrin-like repeats of alpha-actinin-2 and to colocalize with alpha-actinin-2 at the Z lines of skeletal muscle. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. Aberrant alternative splicing of this gene may play a role in myotonic dystrophy. PDZ and LIM domain 3 PDLIM3 ENSG00000154553 NA
27245 This gene encodes a protein containing two AT-hooks, which likely function in DNA binding. Mutations in this gene were found in individuals with Xia-Gibbs syndrome. AT-hook DNA binding motif containing 1 AHDC1 ENSG00000126705 NA
80709 NA AT-hook transcription factor AKNA ENSG00000106948 NA
8519 NA interferon induced transmembrane protein 1 IFITM1 ENSG00000185885 NA
23129 NA plexin D1 PLXND1 ENSG00000004399 NA
718 Complement component C3 plays a central role in the activation of complement system. Its activation is required for both classical and alternative complement activation pathways. The encoded preproprotein is proteolytically processed to generate alpha and beta subunits that form the mature protein, which is then further processed to generate numerous peptide products. The C3a peptide, also known as the C3a anaphylatoxin, modulates inflammation and possesses antimicrobial activity. Mutations in this gene are associated with atypical hemolytic uremic syndrome and age-related macular degeneration in human patients. complement component 3 C3 ENSG00000125730 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",18,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 19 Annotations

out <- mygene::queryMany(gene_list[19,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name summary X_id query symbol
actin, alpha 1, skeletal muscle The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. 58 ENSG00000143632 ACTA1
creatine kinase, M-type The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. 1158 ENSG00000104879 CKM
titin This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. 7273 ENSG00000155657 TTN
nebulin This gene encodes nebulin, a giant protein component of the cytoskeletal matrix that coexists with the thick and thin filaments within the sarcomeres of skeletal muscle. In most vertebrates, nebulin accounts for 3 to 4% of the total myofibrillar protein. The encoded protein contains approximately 30-amino acid long modules that can be classified into 7 types and other repeated modules. Protein isoform sizes vary from 600 to 800 kD due to alternative splicing that is tissue-, species-,and developmental stage-specific. Of the 183 exons in the nebulin gene, at least 43 are alternatively spliced, although exons 143 and 144 are not found in the same transcript. Of the several thousand transcript variants predicted for nebulin, the RefSeq Project has decided to create three representative RefSeq records. Mutations in this gene are associated with recessive nemaline myopathy. 4703 ENSG00000183091 NEB
myosin binding protein C, slow type This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 4604 ENSG00000196091 MYBPC1
myosin, heavy chain 7, cardiac muscle, beta Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. 4625 ENSG00000092054 MYH7
myosin, heavy chain 1, skeletal muscle, adult Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development. 4619 ENSG00000109061 MYH1
myosin light chain 2 Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. 4633 ENSG00000111245 MYL2
myosin, heavy chain 2, skeletal muscle, adult Myosins are actin-based motor proteins that function in the generation of mechanical force in eukaryotic cells. Muscle myosins are heterohexamers composed of 2 myosin heavy chains and 2 pairs of nonidentical myosin light chains. This gene encodes a member of the class II or conventional myosin heavy chains, and functions in skeletal muscle contraction. This gene is found in a cluster of myosin heavy chain genes on chromosome 17. A mutation in this gene results in inclusion body myopathy-3. Multiple alternatively spliced variants, encoding the same protein, have been identified. 4620 ENSG00000125414 MYH2
troponin T1, slow skeletal type This gene encodes a protein that is a subunit of troponin, which is a regulatory complex located on the thin filament of the sarcomere. This complex regulates striated muscle contraction in response to fluctuations in intracellular calcium concentration. This complex is composed of three subunits: troponin C, which binds calcium, troponin T, which binds tropomyosin, and troponin I, which is an inhibitory subunit. This protein is the slow skeletal troponin T subunit. Mutations in this gene cause nemaline myopathy type 5, also known as Amish nemaline myopathy, a neuromuscular disorder characterized by muscle weakness and rod-shaped, or nemaline, inclusions in skeletal muscle fibers which affects infants, resulting in death due to respiratory insufficiency, usually in the second year. Multiple transcript variants encoding different isoforms have been found for this gene. 7138 ENSG00000105048 TNNT1
actin, beta This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. 60 ENSG00000075624 ACTB
natriuretic peptide A The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. 4878 ENSG00000175206 NPPA
troponin C2, fast skeletal type Troponin (Tn), a key protein complex in the regulation of striated muscle contraction, is composed of 3 subunits. The Tn-I subunit inhibits actomyosin ATPase, the Tn-T subunit binds tropomyosin and Tn-C, while the Tn-C subunit binds calcium and overcomes the inhibitory action of the troponin complex on actin filaments. The protein encoded by this gene is the Tn-C subunit. 7125 ENSG00000101470 TNNC2
phosphorylase, glycogen, muscle This gene encodes a muscle enzyme involved in glycogenolysis. Highly similar enzymes encoded by different genes are found in liver and brain. Mutations in this gene are associated with McArdle disease (myophosphorylase deficiency), a glycogen storage disease of muscle. Alternative splicing results in multiple transcript variants. 5837 ENSG00000068976 PYGM
troponin I1, slow skeletal type Troponin proteins associate with tropomyosin and regulate the calcium sensitivity of the myofibril contractile apparatus of striated muscles. Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. The TnI-fast and TnI-slow genes are expressed in fast-twitch and slow-twitch skeletal muscle fibers, respectively, while the TnI-cardiac gene is expressed exclusively in cardiac muscle tissue. This gene encodes the Troponin-I-skeletal-slow-twitch protein. This gene is expressed in cardiac and skeletal muscle during early development but is restricted to slow-twitch skeletal muscle fibers in adults. The encoded protein prevents muscle contraction by inhibiting calcium-mediated conformational changes in actin-myosin complexes. 7135 ENSG00000159173 TNNI1
ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 1 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol to the sarcoplasmic reticulum lumen, and is involved in muscular excitation and contraction. Mutations in this gene cause some autosomal recessive forms of Brody disease, characterized by increasing impairment of muscular relaxation during exercise. Alternative splicing results in three transcript variants encoding different isoforms. 487 ENSG00000196296 ATP2A1
ryanodine receptor 1 This gene encodes a ryanodine receptor found in skeletal muscle. The encoded protein functions as a calcium release channel in the sarcoplasmic reticulum but also serves to connect the sarcoplasmic reticulum and transverse tubule. Mutations in this gene are associated with malignant hyperthermia susceptibility, central core disease, and minicore myopathy with external ophthalmoplegia. Alternatively spliced transcripts encoding different isoforms have been described. 6261 ENSG00000196218 RYR1
cardiomyopathy associated 5 NA 202333 ENSG00000164309 CMYA5
myosin light chain 1 Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain expressed in fast skeletal muscle. Two transcript variants have been identified for this gene. 4632 ENSG00000168530 MYL1
hemoglobin subunit beta The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. 3043 ENSG00000244734 HBB
carbonic anhydrase 3 Carbonic anhydrase III (CAIII) is a member of a multigene family (at least six separate genes are known) that encodes carbonic anhydrase isozymes. These carbonic anhydrases are a class of metalloenzymes that catalyze the reversible hydration of carbon dioxide and are differentially expressed in a number of cell types. The expression of the CA3 gene is strictly tissue specific and present at high levels in skeletal muscle and much lower levels in cardiac and smooth muscle. A proportion of carriers of Duchenne muscle dystrophy have a higher CA3 level than normal. The gene spans 10.3 kb and contains seven exons and six introns. 761 ENSG00000164879 CA3
myoglobin This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. 4151 ENSG00000198125 MB
troponin T3, fast skeletal type The binding of Ca(2+) to the trimeric troponin complex initiates the process of muscle contraction. Increased Ca(2+) concentrations produce a conformational change in the troponin complex that is transmitted to tropomyosin dimers situated along actin filaments. The altered conformation permits increased interaction between a myosin head and an actin filament which, ultimately, produces a muscle contraction. The troponin complex has protein subunits C, I, and T. Subunit C binds Ca(2+) and subunit I binds to actin and inhibits actin-myosin interaction. Subunit T binds the troponin complex to the tropomyosin complex and is also required for Ca(2+)-mediated activation of actomyosin ATPase activity. There are 3 different troponin T genes that encode tissue-specific isoforms of subunit T for fast skeletal-, slow skeletal-, and cardiac-muscle. This gene encodes fast skeletal troponin T protein; also known as troponin T type 3. Alternative splicing results in multiple transcript variants encoding additional distinct troponin T type 3 isoforms. A developmentally regulated switch between fetal/neonatal and adult troponin T type 3 isoforms occurs. Additional splice variants have been described but their biological validity has not been established. Mutations in this gene may cause distal arthrogryposis multiplex congenita type 2B (DA2B). 7140 ENSG00000130595 TNNT3
uncharacterized LOC101927055 NA 101927055 ENSG00000237298 LOC101927055
TTN antisense RNA 1 NA 100506866 ENSG00000237298 TTN-AS1
obscurin, cytoskeletal calmodulin and titin-interacting RhoGEF The obscurin gene spans more than 150 kb, contains over 80 exons and encodes a protein of approximately 720 kDa. The encoded protein contains 68 Ig domains, 2 fibronectin domains, 1 calcium/calmodulin-binding domain, 1 RhoGEF domain with an associated PH domain, and 2 serine-threonine kinase domains. This protein belongs to the family of giant sacromeric signaling proteins that includes titin and nebulin, and may have a role in the organization of myofibrils during assembly and may mediate interactions between the sarcoplasmic reticulum and myofibrils. Alternatively spliced transcript variants encoding different isoforms have been identified. 84033 ENSG00000154358 OBSCN
eukaryotic translation elongation factor 1 alpha 1 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas, and the other isoform (alpha 2) is expressed in brain, heart and skeletal muscle. This isoform is identified as an autoantigen in 66% of patients with Felty syndrome. This gene has been found to have multiple copies on many chromosomes, some of which, if not all, represent different pseudogenes. 1915 ENSG00000156508 EEF1A1
nebulin related anchoring protein NA 4892 ENSG00000197893 NRAP
enolase 3 This gene encodes one of the three enolase isoenzymes found in mammals. This isoenzyme is found in skeletal muscle cells in the adult where it may play a role in muscle development and regeneration. A switch from alpha enolase to beta enolase occurs in muscle tissue during development in rodents. Mutations in this gene have be associated glycogen storage disease. Alternatively spliced transcript variants encoding different isoforms have been described. 2027 ENSG00000108515 ENO3
myosin binding protein C, fast type This gene encodes a member of the myosin-binding protein C family. This family includes the fast-, slow- and cardiac-type isoforms, each of which is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The protein encoded by this locus is referred to as the fast-type isoform. Mutations in the related but distinct genes encoding the slow-type and cardiac-type isoforms have been associated with distal arthrogryposis, type 1 and hypertrophic cardiomyopathy, respectively. 4606 ENSG00000086967 MYBPC2
kelch like family member 41 This gene is a member of the kelch-like family. The encoded protein contains a BACK domain, a BTB/POZ domain, and 5 Kelch repeats. This protein is thought to function in skeletal muscle development and maintenance. Mutations in this gene have been associated with nemaline myopathy (NM), a rare congenital muscle disorder. 10324 ENSG00000239474 KLHL41
protamine 2 Protamines substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis, and are the major DNA-binding proteins in the nucleus of sperm in many vertebrates. They package the sperm DNA into a highly condensed complex in a volume less than 5% of a somatic cell nucleus. Many mammalian species have only one protamine (protamine 1); however, a few species, including human and mouse, have two. This gene encodes protamine 2, which is cleaved to give rise to a family of protamine 2 peptides. Alternatively spliced transcript variants have also been found for this gene. 5620 ENSG00000122304 PRM2
myosin, heavy chain 6, cardiac muscle, alpha Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. 4624 ENSG00000197616 MYH6
troponin I2, fast skeletal type This gene encodes a fast-twitch skeletal muscle protein, a member of the troponin I gene family, and a component of the troponin complex including troponin T, troponin C and troponin I subunits. The troponin complex, along with tropomyosin, is responsible for the calcium-dependent regulation of striated muscle contraction. Mouse studies show that this component is also present in vascular smooth muscle and may play a role in regulation of smooth muscle function. In addition to muscle tissues, this protein is found in corneal epithelium, cartilage where it is an inhibitor of angiogenesis to inhibit tumor growth and metastasis, and mammary gland where it functions as a co-activator of estrogen receptor-related receptor alpha. This protein also suppresses tumor growth in human ovarian carcinoma. Mutations in this gene cause myopathy and distal arthrogryposis type 2B. Alternatively spliced transcript variants have been found for this gene. 7136 ENSG00000130598 TNNI2
myosin light chain, phosphorylatable, fast skeletal muscle NA 29895 ENSG00000180209 MYLPF
NPPA antisense RNA 1 NA ENSG00000242349 ENSG00000242349 NPPA-AS1
poly(A) binding protein cytoplasmic 1 This gene encodes a poly(A) binding protein. The protein shuttles between the nucleus and cytoplasm and binds to the 3’ poly(A) tail of eukaryotic messenger RNAs via RNA-recognition motifs. The binding of this protein to poly(A) promotes ribosome recruitment and translation initiation; it is also required for poly(A) shortening which is the first step in mRNA decay. The gene is part of a small gene family including three protein-coding genes and several pseudogenes. 26986 ENSG00000070756 PABPC1
myosin, heavy chain 9, non-muscle This gene encodes a conventional non-muscle myosin; this protein should not be confused with the unconventional myosin-9a or 9b (MYO9A or MYO9B). The encoded protein is a myosin IIA heavy chain that contains an IQ domain and a myosin head-like domain which is involved in several important functions, including cytokinesis, cell motility and maintenance of cell shape. Defects in this gene have been associated with non-syndromic sensorineural deafness autosomal dominant type 17, Epstein syndrome, Alport syndrome with macrothrombocytopenia, Sebastian syndrome, Fechtner syndrome and macrothrombocytopenia with progressive sensorineural deafness. 4627 ENSG00000100345 MYH9
myosin light chain 7 NA 58498 ENSG00000106631 MYL7
troponin C1, slow skeletal and cardiac type Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z. 7134 ENSG00000114854 TNNC1
H3 histone, family 3B (H3.3B) Histones are basic nuclear proteins that are responsible for the nucleosome structure of the chromosomal fiber in eukaryotes. Two molecules of each of the four core histones (H2A, H2B, H3, and H4) form an octamer, around which approximately 146 bp of DNA is wrapped in repeating units, called nucleosomes. The linker histone, H1, interacts with linker DNA between nucleosomes and functions in the compaction of chromatin into higher order structures. This gene contains introns and its mRNA is polyadenylated, unlike most histone genes. The protein encoded by this gene is a replication-independent histone that is a member of the histone H3 family. Pseudogenes of this gene have been identified on the X chromosome, and on chromosomes 5, 13 and 17. 3021 ENSG00000132475 H3F3B
eukaryotic translation elongation factor 1 alpha 2 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 2) is expressed in brain, heart and skeletal muscle, and the other isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas. This gene may be critical in the development of ovarian cancer. 1917 ENSG00000101210 EEF1A2
myozenin 1 The protein encoded by this gene is primarily expressed in the skeletal muscle, and belongs to the myozenin family. Members of this family function as calcineurin-interacting proteins that help tether calcineurin to the sarcomere of cardiac and skeletal muscle. They play an important role in modulation of calcineurin signaling. 58529 ENSG00000177791 MYOZ1
eukaryotic translation elongation factor 1 alpha 1 pseudogene 5 NA ENSG00000196205 ENSG00000196205 EEF1A1P5
tropomyosin 3 This gene encodes a member of the tropomyosin family of actin-binding proteins. Tropomyosins are dimers of coiled-coil proteins that provide stability to actin filaments and regulate access of other actin-binding proteins. Mutations in this gene result in autosomal dominant nemaline myopathy and other muscle disorders. This locus is involved in translocations with other loci, including anaplastic lymphoma receptor tyrosine kinase (ALK) and neurotrophic tyrosine kinase receptor type 1 (NTRK1), which result in the formation of fusion proteins that act as oncogenes. There are numerous pseudogenes for this gene on different chromosomes. Alternative splicing results in multiple transcript variants. 7170 ENSG00000143549 TPM3
protamine 1 NA 5619 ENSG00000175646 PRM1
ribosomal protein L3 Ribosomes, the complexes that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L3P family of ribosomal proteins and it is located in the cytoplasm. The protein can bind to the HIV-1 TAR mRNA, and it has been suggested that the protein contributes to tat-mediated transactivation. This gene is co-transcribed with several small nucleolar RNA genes, which are located in several of this gene’s introns. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. 6122 ENSG00000100316 RPL3
hemoglobin subunit alpha 2 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. 3040 ENSG00000188536 HBA2
DEAD-box helicase 5 DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure, such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a DEAD box protein, which is a RNA-dependent ATPase, and also a proliferation-associated nuclear antigen, specifically reacting with the simian virus 40 tumor antigen. Alternative splicing results in multiple transcript variants. 1655 ENSG00000108654 DDX5
uncharacterized LOC100129518 NA 100129518 ENSG00000112096 LOC100129518
superoxide dismutase 2, mitochondrial This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 1. 6648 ENSG00000112096 SOD2
dual specificity phosphatase 1 The expression of DUSP1 gene is induced in human skin fibroblasts by oxidative/heat stress and growth factors. It specifies a protein with structural features similar to members of the non-receptor-type protein-tyrosine phosphatase family, and which has significant amino-acid sequence similarity to a Tyr/Ser-protein phosphatase encoded by the late gene H1 of vaccinia virus. The bacterially expressed and purified DUSP1 protein has intrinsic phosphatase activity, and specifically inactivates mitogen-activated protein (MAP) kinase in vitro by the concomitant dephosphorylation of both its phosphothreonine and phosphotyrosine residues. Furthermore, it suppresses the activation of MAP kinase by oncogenic ras in extracts of Xenopus oocytes. Thus, DUSP1 may play an important role in the human cellular response to environmental stress as well as in the negative regulation of cellular proliferation. 1843 ENSG00000120129 DUSP1
LDL receptor related protein 1 This gene encodes a member of the low-density lipoprotein receptor family of proteins. The encoded preproprotein is proteolytically processed by furin to generate 515 kDa and 85 kDa subunits that form the mature receptor (PMID: 8546712). This receptor is involved in several cellular processes, including intracellular signaling, lipid homeostasis, and clearance of apoptotic cells. In addition, the encoded protein is necessary for the alpha 2-macroglobulin-mediated clearance of secreted amyloid precursor protein and beta-amyloid, the main component of amyloid plaques found in Alzheimer patients. Expression of this gene decreases with age and has been found to be lower than controls in brain tissue from Alzheimer’s disease patients. 4035 ENSG00000123384 LRP1
tropomyosin 4 This gene encodes a member of the tropomyosin family of actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosins are dimers of coiled-coil proteins that polymerize end-to-end along the major groove in most actin filaments. They provide stability to the filaments and regulate access of other actin-binding proteins. In muscle cells, they regulate muscle contraction by controlling the binding of myosin heads to the actin filament. Multiple transcript variants encoding different isoforms have been found for this gene. 7171 ENSG00000167460 TPM4
actin, alpha, cardiac muscle 1 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). 70 ENSG00000159251 ACTC1
actinin alpha 2 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. 88 ENSG00000077522 ACTN2
myotilin This gene encodes a cystoskeletal protein which plays a significant role in the stability of thin filaments during muscle contraction. This protein binds F-actin, crosslinks actin filaments, and prevents latrunculin A-induced filament disassembly. Mutations in this gene have been associated with limb-girdle muscular dystrophy and myofibrillar myopathies. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. 9499 ENSG00000120729 MYOT
calsequestrin 1 This gene encodes the skeletal muscle specific member of the calsequestrin protein family. Calsequestrin functions as a luminal sarcoplasmic reticulum calcium sensor in both cardiac and skeletal muscle cells. This protein, also known as calmitine, functions as a calcium regulator in the mitochondria of skeletal muscle. This protein is absent in patients with Duchenne and Becker types of muscular dystrophy. 844 ENSG00000143318 CASQ1
bridging integrator 1 This gene encodes several isoforms of a nucleocytoplasmic adaptor protein, one of which was initially identified as a MYC-interacting protein with features of a tumor suppressor. Isoforms that are expressed in the central nervous system may be involved in synaptic vesicle endocytosis and may interact with dynamin, synaptojanin, endophilin, and clathrin. Isoforms that are expressed in muscle and ubiquitously expressed isoforms localize to the cytoplasm and nucleus and activate a caspase-independent apoptotic process. Studies in mouse suggest that this gene plays an important role in cardiac muscle development. Alternate splicing of the gene results in several transcript variants encoding different isoforms. Aberrant splice variants expressed in tumor cell lines have also been described. 274 ENSG00000136717 BIN1
thyroglobulin Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. 7038 ENSG00000042832 TG
titin-cap Sarcomere assembly is regulated by the muscle protein titin. Titin is a giant elastic protein with kinase activity that extends half the length of a sarcomere. It serves as a scaffold to which myofibrils and other muscle related proteins are attached. This gene encodes a protein found in striated and cardiac muscle that binds to the titin Z1-Z2 domains and is a substrate of titin kinase, interactions thought to be critical to sarcomere assembly. Mutations in this gene are associated with limb-girdle muscular dystrophy type 2G. 8557 ENSG00000173991 TCAP
SH3 and cysteine rich domain 3 The protein encoded by this gene is a component of the excitation-contraction coupling machinery of muscles. This protein is a member of the Stac gene family and contains an N-terminal cysteine-rich domain and two SH3 domains. Mutations in this gene are a cause of Native American myopathy. 246329 ENSG00000185482 STAC3
aldolase, fructose-bisphosphate A The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10. 226 ENSG00000149925 ALDOA
myosin light chain 6 Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain that is expressed in smooth muscle and non-muscle tissues. Genomic sequences representing several pseudogenes have been described and two transcript variants encoding different isoforms have been identified for this gene. 4637 ENSG00000092841 MYL6
glutathione peroxidase 3 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. 2878 ENSG00000211445 GPX3
PHD finger protein 7 Spermatogenesis is a complex process regulated by extracellular and intracellular factors as well as cellular interactions among interstitial cells of the testis, Sertoli cells, and germ cells. This gene is expressed in the testis in Sertoli cells but not germ cells. The protein encoded by this gene contains plant homeodomain (PHD) finger domains, also known as leukemia associated protein (LAP) domains, believed to be involved in transcriptional regulation. The protein, which localizes to the nucleus of transfected cells, has been implicated in the transcriptional regulation of spermatogenesis. Alternate splicing results in multiple transcript variants of this gene. 51533 ENSG00000010318 PHF7
amyloid beta precursor like protein 2 This gene encodes amyloid precursor- like protein 2 (APLP2), which is a member of the APP (amyloid precursor protein) family including APP, APLP1 and APLP2. This protein is ubiquitously expressed. It contains heparin-, copper- and zinc- binding domains at the N-terminus, BPTI/Kunitz inhibitor and E2 domains in the middle region, and transmembrane and intracellular domains at the C-terminus. This protein interacts with major histocompatibility complex (MHC) class I molecules. The synergy of this protein and the APP is required to mediate neuromuscular transmission, spatial learning and synaptic plasticity. This protein has been implicated in the pathogenesis of Alzheimer’s disease. Multiple alternatively spliced transcript variants encoding different isoforms have been identified. 334 ENSG00000084234 APLP2
troponin T2, cardiac type The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. 7139 ENSG00000118194 TNNT2
filamin C This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. 2318 ENSG00000128591 FLNC
tripartite motif containing 63 This gene encodes a member of the RING zinc finger protein family found in striated muscle and iris. The product of this gene is an E3 ubiquitin ligase that localizes to the Z-line and M-line lattices of myofibrils. This protein plays an important role in the atrophy of skeletal and cardiac muscle and is required for the degradation of myosin heavy chain proteins, myosin light chain, myosin binding protein, and for muscle-type creatine kinase. 84676 ENSG00000158022 TRIM63
integral membrane protein 2B Amyloid precursor proteins are processed by beta-secretase and gamma-secretase to produce beta-amyloid peptides which form the characteristic plaques of Alzheimer disease. This gene encodes a transmembrane protein which is processed at the C-terminus by furin or furin-like proteases to produce a small secreted peptide which inhibits the deposition of beta-amyloid. Mutations which result in extension of the C-terminal end of the encoded protein, thereby increasing the size of the secreted peptide, are associated with two neurogenerative diseases, familial British dementia and familial Danish dementia. 9445 ENSG00000136156 ITM2B
CD81 molecule The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. This encoded protein is a cell surface glycoprotein that is known to complex with integrins. This protein appears to promote muscle cell fusion and support myotube maintenance. Also it may be involved in signal transduction. This gene is localized in the tumor-suppressor gene region and thus it is a candidate gene for malignancies. Two transcript variants encoding different isoforms have been found for this gene. 975 ENSG00000110651 CD81
prothymosin, alpha NA 5757 ENSG00000187514 PTMA
prothymosin alpha-like NA 728026 ENSG00000187514 LOC728026
KIAA0754 NA 643314 ENSG00000127603 KIAA0754
microtubule-actin crosslinking factor 1 This gene encodes a large protein containing numerous spectrin and leucine-rich repeat (LRR) domains. The encoded protein is a member of a family of proteins that form bridges between different cytoskeletal elements. This protein facilitates actin-microtubule interactions at the cell periphery and couples the microtubule network to cellular junctions. Alternative splicing results in multiple transcript variants, but the full-length nature of some of these variants has not been determined. 23499 ENSG00000127603 MACF1
eukaryotic translation initiation factor 4 gamma 2 Translation initiation is mediated by specific recognition of the cap structure by eukaryotic translation initiation factor 4F (eIF4F), which is a cap binding protein complex that consists of three subunits: eIF4A, eIF4E and eIF4G. The protein encoded by this gene shares similarity with the C-terminal region of eIF4G that contains the binding sites for eIF4A and eIF3; eIF4G, in addition, contains a binding site for eIF4E at the N-terminus. Unlike eIF4G, which supports cap-dependent and independent translation, this gene product functions as a general repressor of translation by forming translationally inactive complexes. In vitro and in vivo studies indicate that translation of this mRNA initiates exclusively at a non-AUG (GUG) codon. Alternatively spliced transcript variants encoding different isoforms of this gene have been described. 1982 ENSG00000110321 EIF4G2
four and a half LIM domains 3 The protein encoded by this gene is a member of a family of proteins containing a four-and-a-half LIM domain, which is a highly conserved double zinc finger motif. The encoded protein has been shown to interact with the cancer developmental regulators SMAD2, SMAD3, and SMAD4, the skeletal muscle myogenesis protein MyoD, and the high-affinity IgE beta chain regulator MZF-1. This protein may be involved in tumor suppression, repression of MyoD expression, and repression of IgE receptor expression. Two transcript variants encoding different isoforms have been found for this gene. 2275 ENSG00000183386 FHL3
ras homolog family member A This gene encodes a member of the Rho family of small GTPases, which cycle between inactive GDP-bound and active GTP-bound states and function as molecular switches in signal transduction cascades. Rho proteins promote reorganization of the actin cytoskeleton and regulate cell shape, attachment, and motility. Overexpression of this gene is associated with tumor cell proliferation and metastasis. Multiple alternatively spliced variants have been identified. 387 ENSG00000067560 RHOA
ras homolog family member B NA 388 ENSG00000143878 RHOB
talin 1 This gene encodes a cytoskeletal protein that is concentrated in areas of cell-substratum and cell-cell contacts. The encoded protein plays a significant role in the assembly of actin filaments and in spreading and migration of various cell types, including fibroblasts and osteoclasts. It codistributes with integrins in the cell surface membrane in order to assist in the attachment of adherent cells to extracellular matrices and of lymphocytes to other cells. The N-terminus of this protein contains elements for localization to cell-extracellular matrix junctions. The C-terminus contains binding sites for proteins such as beta-1-integrin, actin, and vinculin. 7094 ENSG00000137076 TLN1
eukaryotic translation initiation factor 4A1 NA 1973 ENSG00000161960 EIF4A1
amyloid beta precursor protein This gene encodes a cell surface receptor and transmembrane precursor protein that is cleaved by secretases to form a number of peptides. Some of these peptides are secreted and can bind to the acetyltransferase complex APBB1/TIP60 to promote transcriptional activation, while others form the protein basis of the amyloid plaques found in the brains of patients with Alzheimer disease. In addition, two of the peptides are antimicrobial peptides, having been shown to have bacteriocidal and antifungal activities. Mutations in this gene have been implicated in autosomal dominant Alzheimer disease and cerebroarterial amyloidosis (cerebral amyloid angiopathy). Multiple transcript variants encoding several different isoforms have been found for this gene. 351 ENSG00000142192 APP
splicing factor 3b subunit 1 This gene encodes subunit 1 of the splicing factor 3b protein complex. Splicing factor 3b, together with splicing factor 3a and a 12S RNA unit, forms the U2 small nuclear ribonucleoproteins complex (U2 snRNP). The splicing factor 3b/3a complex binds pre-mRNA upstream of the intron’s branch site in a sequence independent manner and may anchor the U2 snRNP to the pre-mRNA. Splicing factor 3b is also a component of the minor U12-type spliceosome. The carboxy-terminal two-thirds of subunit 1 have 22 non-identical, tandem HEAT repeats that form rod-like, helical structures. Alternative splicing results in multiple transcript variants encoding different isoforms. 23451 ENSG00000115524 SF3B1
lysosomal associated membrane protein 1 The protein encoded by this gene is a member of a family of membrane glycoproteins. This glycoprotein provides selectins with carbohydrate ligands. It may also play a role in tumor cell metastasis. 3916 ENSG00000185896 LAMP1
heterogeneous nuclear ribonucleoprotein U This gene belongs to the subfamily of ubiquitously expressed heterogeneous nuclear ribonucleoproteins (hnRNPs). The hnRNPs are RNA binding proteins and they form complexes with heterogeneous nuclear RNA (hnRNA). These proteins are associated with pre-mRNAs in the nucleus and appear to influence pre-mRNA processing and other aspects of mRNA metabolism and transport. While all of the hnRNPs are present in the nucleus, some seem to shuttle between the nucleus and the cytoplasm. The hnRNP proteins have distinct nucleic acid binding properties. The protein encoded by this gene contains a RNA binding domain and scaffold-associated region (SAR)-specific bipartite DNA-binding domain. This protein is also thought to be involved in the packaging of hnRNA into large ribonucleoprotein complexes. During apoptosis, this protein is cleaved in a caspase-dependent way. Cleavage occurs at the SALD site, resulting in a loss of DNA-binding activity and a concomitant detachment of this protein from nuclear structural sites. But this cleavage does not affect the function of the encoded protein in RNA metabolism. At least two alternatively spliced transcript variants have been identified for this gene. 3192 ENSG00000153187 HNRNPU
trans-golgi network protein 2 This gene encodes a type I integral membrane protein that is localized to the trans-Golgi network, a major sorting station for secretory and membrane proteins. The encoded protein cycles between early endosomes and the trans-Golgi network, and may play a role in exocytic vesicle formation. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 10618 ENSG00000152291 TGOLN2
calnexin This gene encodes a member of the calnexin family of molecular chaperones. The encoded protein is a calcium-binding, endoplasmic reticulum (ER)-associated protein that interacts transiently with newly synthesized N-linked glycoproteins, facilitating protein folding and assembly. It may also play a central role in the quality control of protein folding by retaining incorrectly folded protein subunits within the ER for degradation. Alternatively spliced transcript variants encoding the same protein have been described. 821 ENSG00000127022 CANX
Y-box binding protein 1 This gene encodes a highly conserved cold shock domain protein that has broad nucleic acid binding properties. The encoded protein functions as both a DNA and RNA binding protein and has been implicated in numerous cellular processes including regulation of transcription and translation, pre-mRNA splicing, DNA reparation and mRNA packaging. This protein is also a component of messenger ribonucleoprotein (mRNP) complexes and may have a role in microRNA processing. This protein can be secreted through non-classical pathways and functions as an extracellular mitogen. Aberrant expression of the gene is associated with cancer proliferation in numerous tissues. This gene may be a prognostic marker for poor outcome and drug resistance in certain cancers. Alternate splicing results in multiple transcript variants. Pseudogenes of this gene are found on multiple chromosomes. 4904 ENSG00000065978 YBX1
decorin This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. 1634 ENSG00000011465 DCN
spectrin alpha, non-erythrocytic 1 Spectrins are a family of filamentous cytoskeletal proteins that function as essential scaffold proteins that stabilize the plasma membrane and organize intracellular organelles. Spectrins are composed of alpha and beta dimers that associate to form tetramers linked in a head-to-head arrangement. This gene encodes an alpha spectrin that is specifically expressed in nonerythrocytic cells. The encoded protein has been implicated in other cellular functions including DNA repair and cell cycle regulation. Mutations in this gene are the cause of early infantile epileptic encephalopathy-5. Alternate splicing results in multiple transcript variants. 6709 ENSG00000197694 SPTAN1
cyclin I The protein encoded by this gene belongs to the highly conserved cyclin family, whose members are characterized by a dramatic periodicity in protein abundance through the cell cycle. Cyclins function as regulators of CDK kinases. Different cyclins exhibit distinct expression and degradation patterns which contribute to the temporal coordination of each mitotic event. This cyclin shows the highest similarity with cyclin G. The transcript of this gene was found to be expressed constantly during cell cycle progression. The function of this cyclin has not yet been determined. 10983 ENSG00000118816 CCNI
myosin binding protein C, cardiac MYBPC3 encodes the cardiac isoform of myosin-binding protein C. Myosin-binding protein C is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. MYBPC3, the cardiac isoform, is expressed exclussively in heart muscle. Regulatory phosphorylation of the cardiac isoform in vivo by cAMP-dependent protein kinase (PKA) upon adrenergic stimulation may be linked to modulation of cardiac contraction. Mutations in MYBPC3 are one cause of familial hypertrophic cardiomyopathy. 4607 ENSG00000134571 MYBPC3
S100 calcium binding protein A9 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. 6280 ENSG00000163220 S100A9
alpha-2-macroglobulin Alpha-2-macroglobulin is a protease inhibitor and cytokine transporter. It inhibits many proteases, including trypsin, thrombin and collagenase. A2M is implicated in Alzheimer disease (AD) due to its ability to mediate the clearance and degradation of A-beta, the major component of beta-amyloid deposits. 2 ENSG00000175899 A2M
AHNAK nucleoprotein NA 79026 ENSG00000124942 AHNAK
myosin light chain 4 Myosin is a hexameric ATPase cellular motor protein. It is composed of two myosin heavy chains, two nonphosphorylatable myosin alkali light chains, and two phosphorylatable myosin regulatory light chains. This gene encodes a myosin alkali light chain that is found in embryonic muscle and adult atria. Two alternatively spliced transcript variants encoding the same protein have been found for this gene. 4635 ENSG00000198336 MYL4
actinin alpha 4 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, alpha actinin isoform which is concentrated in the cytoplasm, and thought to be involved in metastatic processes. Mutations in this gene have been associated with focal and segmental glomerulosclerosis. 81 ENSG00000130402 ACTN4
heterogeneous nuclear ribonucleoprotein K This gene belongs to the subfamily of ubiquitously expressed heterogeneous nuclear ribonucleoproteins (hnRNPs). The hnRNPs are RNA binding proteins and they complex with heterogeneous nuclear RNA (hnRNA). These proteins are associated with pre-mRNAs in the nucleus and appear to influence pre-mRNA processing and other aspects of mRNA metabolism and transport. While all of the hnRNPs are present in the nucleus, some seem to shuttle between the nucleus and the cytoplasm. The hnRNP proteins have distinct nucleic acid binding properties. The protein encoded by this gene is located in the nucleoplasm and has three repeats of KH domains that binds to RNAs. It is distinct among other hnRNP proteins in its binding preference; it binds tenaciously to poly(C). This protein is also thought to have a role during cell cycle progession. Several alternatively spliced transcript variants have been described for this gene, however, not all of them are fully characterized. 3190 ENSG00000165119 HNRNPK
uncharacterized LOC100507537 NA 100507537 ENSG00000240045 LOC100507537
nischarin This gene encodes a nonadrenergic imidazoline-1 receptor protein that localizes to the cytosol and anchors to the inner layer of the plasma membrane. The orthologous mouse protein has been shown to influence cytoskeletal organization and cell migration by binding to alpha-5-beta-1 integrin. In humans, this protein has been shown to bind to the adapter insulin receptor substrate 4 (IRS4) to mediate translocation of alpha-5 integrin from the cell membrane to endosomes. Expression of this protein was reduced in human breast cancers while its overexpression reduced tumor growth and metastasis; possibly by limiting the expression of alpha-5 integrin. In human cardiac tissue, this gene was found to affect cell growth and death while in neural tissue it affected neuronal growth and differentiation. Alternative splicing results in multiple transcript variants encoding differerent isoforms. Some isoforms lack the expected C-terminal domains of a functional imidazoline receptor. 11188 ENSG00000010322 NISCH
poly(rC) binding protein 2 The protein encoded by this gene appears to be multifunctional. Along with PCBP-1 and hnRNPK, it is one of the major cellular poly(rC)-binding proteins. The encoded protein contains three K-homologous (KH) domains which may be involved in RNA binding. Together with PCBP-1, this protein also functions as a translational coactivator of poliovirus RNA via a sequence-specific interaction with stem-loop IV of the IRES, promoting poliovirus RNA replication by binding to its 5’-terminal cloverleaf structure. It has also been implicated in translational control of the 15-lipoxygenase mRNA, human papillomavirus type 16 L2 mRNA, and hepatitis A virus RNA. The encoded protein is also suggested to play a part in formation of a sequence-specific alpha-globin mRNP complex which is associated with alpha-globin mRNA stability. This multiexon structural mRNA is thought to be retrotransposed to generate PCBP-1, an intronless gene with functions similar to that of PCBP2. This gene and PCBP-1 have paralogous genes (PCBP3 and PCBP4) which are thought to have arisen as a result of duplication events of entire genes. Thsi gene also has two processed pseudogenes (PCBP2P1 and PCBP2P2). Multiple transcript variants encoding different isoforms have been found for this gene. 5094 ENSG00000197111 PCBP2
eukaryotic translation elongation factor 1 alpha 1 pseudogene 6 NA ENSG00000233476 ENSG00000233476 EEF1A1P6
polymerase (RNA) II subunit A This gene encodes the largest subunit of RNA polymerase II, the polymerase responsible for synthesizing messenger RNA in eukaryotes. The product of this gene contains a carboxy terminal domain composed of heptapeptide repeats that are essential for polymerase activity. These repeats contain serine and threonine residues that are phosphorylated in actively transcribing RNA polymerase. In addition, this subunit, in combination with several other polymerase subunits, forms the DNA binding domain of the polymerase, a groove in which the DNA template is transcribed into RNA. 5430 ENSG00000181222 POLR2A
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",19,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 20 Annotations

out <- mygene::queryMany(gene_list[20,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id symbol summary query name notfound
3858 KRT10 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. ENSG00000186395 keratin 10 NA
3848 KRT1 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000167768 keratin 1 NA
4155 MBP The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. ENSG00000197971 myelin basic protein NA
3849 KRT2 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000172867 keratin 2 NA
5620 PRM2 Protamines substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis, and are the major DNA-binding proteins in the nucleus of sperm in many vertebrates. They package the sperm DNA into a highly condensed complex in a volume less than 5% of a somatic cell nucleus. Many mammalian species have only one protamine (protamine 1); however, a few species, including human and mouse, have two. This gene encodes protamine 2, which is cleaved to give rise to a family of protamine 2 peptides. Alternatively spliced transcript variants have also been found for this gene. ENSG00000122304 protamine 2 NA
ENSG00000266844 RP11-862L9.3 NA ENSG00000266844 NA NA
3860 KRT13 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. ENSG00000171401 keratin 13 NA
64065 PERP NA ENSG00000112378 PERP, TP53 apoptosis effector NA
93099 DMKN This gene is upregulated in inflammatory diseases, and it was first observed as expressed in the differentiated layers of skin. The most interesting aspect of this gene is the differential use of promoters and terminators to generate isoforms with unique cellular distributions and domain components. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. ENSG00000161249 dermokine NA
5619 PRM1 NA ENSG00000175646 protamine 1 NA
1674 DES This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. ENSG00000175084 desmin NA
3852 KRT5 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the basal layer of the epidermis with family member KRT14. Mutations in these genes have been associated with a complex of diseases termed epidermolysis bullosa simplex. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000186081 keratin 5 NA
51806 CALML5 This gene encodes a novel calcium binding protein expressed in the epidermis and related to the calmodulin family of calcium binding proteins. Functional studies with recombinant protein demonstrate it does bind calcium and undergoes a conformational change when it does so. Abundant expression is detected only in reconstructed epidermis and is restricted to differentiating keratinocytes. In addition, it can associate with transglutaminase 3, shown to be a key enzyme in the terminal differentiation of keratinocytes. ENSG00000178372 calmodulin like 5 NA
3861 KRT14 This gene encodes a member of the keratin family, the most diverse group of intermediate filaments. This gene product, a type I keratin, is usually found as a heterotetramer with two keratin 5 molecules, a type II keratin. Together they form the cytoskeleton of epithelial cells. Mutations in the genes for these keratins are associated with epidermolysis bullosa simplex. At least one pseudogene has been identified at 17p12-p11. ENSG00000186847 keratin 14 NA
5166 PDK4 This gene is a member of the PDK/BCKDK protein kinase family and encodes a mitochondrial protein with a histidine kinase domain. This protein is located in the matrix of the mitrochondria and inhibits the pyruvate dehydrogenase complex by phosphorylating one of its subunits, thereby contributing to the regulation of glucose metabolism. Expression of this gene is regulated by glucocorticoids, retinoic acid and insulin. ENSG00000004799 pyruvate dehydrogenase kinase 4 NA
283131 NEAT1 This gene produces a long non-coding RNA (lncRNA) transcribed from the multiple endocrine neoplasia locus. This lncRNA is retained in the nucleus where it forms the core structural component of the paraspeckle sub-organelles. It may act as a transcriptional regulator for numerous genes, including some genes involved in cancer progression. ENSG00000245532 nuclear paraspeckle assembly transcript 1 (non-protein coding) NA
51533 PHF7 Spermatogenesis is a complex process regulated by extracellular and intracellular factors as well as cellular interactions among interstitial cells of the testis, Sertoli cells, and germ cells. This gene is expressed in the testis in Sertoli cells but not germ cells. The protein encoded by this gene contains plant homeodomain (PHD) finger domains, also known as leukemia associated protein (LAP) domains, believed to be involved in transcriptional regulation. The protein, which localizes to the nucleus of transfected cells, has been implicated in the transcriptional regulation of spermatogenesis. Alternate splicing results in multiple transcript variants of this gene. ENSG00000010318 PHD finger protein 7 NA
7168 TPM1 This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. ENSG00000140416 tropomyosin 1 (alpha) NA
58473 PLEKHB1 NA ENSG00000021300 pleckstrin homology domain containing B1 NA
3851 KRT4 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000170477 keratin 4 NA
2670 GFAP This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. ENSG00000131095 glial fibrillary acidic protein NA
4014 LOR This gene encodes loricrin, a major protein component of the cornified cell envelope found in terminally differentiated epidermal cells. Mutations in this gene are associated with Vohwinkel’s syndrome and progressive symmetric erythrokeratoderma, both inherited skin diseases. ENSG00000203782 loricrin NA
7178 TPT1 NA ENSG00000133112 tumor protein, translationally-controlled 1 NA
60 ACTB This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. ENSG00000075624 actin, beta NA
1471 CST3 The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and the kininogens. The type 2 cystatin proteins are a class of cysteine proteinase inhibitors found in a variety of human fluids and secretions, where they appear to provide protective functions. The cystatin locus on chromosome 20 contains the majority of the type 2 cystatin genes and pseudogenes. This gene is located in the cystatin locus and encodes the most abundant extracellular inhibitor of cysteine proteases, which is found in high concentrations in biological fluids and is expressed in virtually all organs of the body. A mutation in this gene has been associated with amyloid angiopathy. Expression of this protein in vascular wall smooth muscle cells is severely reduced in both atherosclerotic and aneurysmal aortic lesions, establishing its role in vascular disease. In addition, this protein has been shown to have an antimicrobial function, inhibiting the replication of herpes simplex virus. Alternative splicing results in multiple transcript variants encoding a single protein. ENSG00000101439 cystatin C NA
7314 UBB This gene encodes ubiquitin, one of the most conserved proteins known. Ubiquitin has a major role in targeting cellular proteins for degradation by the 26S proteosome. It is also involved in the maintenance of chromatin structure, the regulation of gene expression, and the stress response. Ubiquitin is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin moiety fused to an unrelated protein. This gene consists of three direct repeats of the ubiquitin coding sequence with no spacer sequence. Consequently, the protein is expressed as a polyubiquitin precursor with a final amino acid after the last repeat. An aberrant form of this protein has been detected in patients with Alzheimer’s disease and Down syndrome. Pseudogenes of this gene are located on chromosomes 1, 2, 13, and 17. Alternative splicing results in multiple transcript variants. ENSG00000170315 ubiquitin B NA
388533 KRTDAP This gene encodes a protein which may function in the regulation of keratinocyte differentiation and maintenance of stratified epithelia. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000188508 keratinocyte differentiation associated protein NA
222166 MTURN NA ENSG00000180354 maturin, neural progenitor differentiation regulator homolog (Xenopus) NA
7169 TPM2 This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000198467 tropomyosin 2 (beta) NA
5317 PKP1 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This protein may be involved in molecular recruitment and stabilization during desmosome formation. Mutations in this gene have been associated with the ectodermal dysplasia/skin fragility syndrome. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000081277 plakophilin 1 NA
682 BSG The protein encoded by this gene is a plasma membrane protein that is important in spermatogenesis, embryo implantation, neural network formation, and tumor progression. The encoded protein is also a member of the immunoglobulin superfamily. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000172270 basigin (Ok blood group) NA
2335 FN1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. ENSG00000115414 fibronectin 1 NA
117159 DCD This antimicrobial gene encodes a secreted protein that is subsequently processed into mature peptides of distinct biological activities. The C-terminal peptide is constitutively expressed in sweat and has antibacterial and antifungal activities. The N-terminal peptide, also known as diffusible survival evasion peptide, promotes neural cell survival under conditions of severe oxidative stress. A glycosylated form of the N-terminal peptide may be associated with cachexia (muscle wasting) in cancer patients. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000161634 dermcidin NA
2023 ENO1 This gene encodes alpha-enolase, one of three enolase isoenzymes found in mammals. Each isoenzyme is a homodimer composed of 2 alpha, 2 gamma, or 2 beta subunits, and functions as a glycolytic enzyme. Alpha-enolase in addition, functions as a structural lens protein (tau-crystallin) in the monomeric form. Alternative splicing of this gene results in a shorter isoform that has been shown to bind to the c-myc promoter and function as a tumor suppressor. Several pseudogenes have been identified, including one on the long arm of chromosome 1. Alpha-enolase has also been identified as an autoantigen in Hashimoto encephalopathy. ENSG00000074800 enolase 1 NA
65108 MARCKSL1 This gene encodes a member of the myristoylated alanine-rich C-kinase substrate (MARCKS) family. Members of this family play a role in cytoskeletal regulation, protein kinase C signaling and calmodulin signaling. The encoded protein affects the formation of adherens junction. Alternative splicing results in multiple transcript variants. Pseudogenes of this gene are located on the long arm of chromosomes 6 and 10. ENSG00000175130 MARCKS like 1 NA
6707 SPRR3 NA ENSG00000163209 small proline rich protein 3 NA
10409 BASP1 This gene encodes a membrane bound protein with several transient phosphorylation sites and PEST motifs. Conservation of proteins with PEST sequences among different species supports their functional significance. PEST sequences typically occur in proteins with high turnover rates. Immunological characteristics of this protein are species specific. This protein also undergoes N-terminal myristoylation. Alternative splicing results in multiple transcript variants that encode the same protein. ENSG00000176788 brain abundant membrane attached signal protein 1 NA
2879 GPX4 This gene encodes a member of the glutathione peroxidase protein family. Glutathione peroxidase catalyzes the reduction of hydrogen peroxide, organic hydroperoxide, and lipid peroxides by reduced glutathione and functions in the protection of cells against oxidative damage. Human plasma glutathione peroxidase has been shown to be a selenium-containing enzyme and the UGA codon is translated into a selenocysteine. The encoded protein has been identified as a moonlighting protein based on its ability to serve dual functions as a peroxidase as well as a structural protein in mature spermatozoa. Through alternative splicing and transcription initiation, rat produces proteins that localize to the nucleus, mitochondrion, and cytoplasm. In humans, alternative transcription initiation and the cleavage sites of the mitochondrial and nuclear transit peptides need to be experimentally verified. Alternative splicing results in multiple transcript variants. ENSG00000167468 glutathione peroxidase 4 NA
2778 GNAS This locus has a highly complex imprinted expression pattern. It gives rise to maternally, paternally, and biallelically expressed transcripts that are derived from four alternative promoters and 5’ exons. Some transcripts contain a differentially methylated region (DMR) at their 5’ exons, and this DMR is commonly found in imprinted genes and correlates with transcript expression. An antisense transcript is produced from an overlapping locus on the opposite strand. One of the transcripts produced from this locus, and the antisense transcript, are paternally expressed noncoding RNAs, and may regulate imprinting in this region. In addition, one of the transcripts contains a second overlapping ORF, which encodes a structurally unrelated protein - Alex. Alternative splicing of downstream exons is also observed, which results in different forms of the stimulatory G-protein alpha subunit, a key element of the classical signal transduction pathway linking receptor-ligand interactions with the activation of adenylyl cyclase and a variety of cellular reponses. Multiple transcript variants encoding different isoforms have been found for this gene. Mutations in this gene result in pseudohypoparathyroidism type 1a, pseudohypoparathyroidism type 1b, Albright hereditary osteodystrophy, pseudopseudohypoparathyroidism, McCune-Albright syndrome, progressive osseus heteroplasia, polyostotic fibrous dysplasia of bone, and some pituitary tumors. ENSG00000087460 GNAS complex locus NA
3691 ITGB4 Integrins are heterodimers comprised of alpha and beta subunits, that are noncovalently associated transmembrane glycoprotein receptors. Different combinations of alpha and beta polypeptides form complexes that vary in their ligand-binding specificities. Integrins mediate cell-matrix or cell-cell adhesion, and transduced signals that regulate gene expression and cell growth. This gene encodes the integrin beta 4 subunit, a receptor for the laminins. This subunit tends to associate with alpha 6 subunit and is likely to play a pivotal role in the biology of invasive carcinoma. Mutations in this gene are associated with epidermolysis bullosa with pyloric atresia. Multiple alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. ENSG00000132470 integrin subunit beta 4 NA
5660 PSAP This gene encodes a highly conserved preproprotein that is proteolytically processed to generate four main cleavage products including saposins A, B, C, and D. Each domain of the precursor protein is approximately 80 amino acid residues long with nearly identical placement of cysteine residues and glycosylation sites. Saposins A-D localize primarily to the lysosomal compartment where they facilitate the catabolism of glycosphingolipids with short oligosaccharide groups. The precursor protein exists both as a secretory protein and as an integral membrane protein and has neurotrophic activities. Mutations in this gene have been associated with Gaucher disease and metachromatic leukodystrophy. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. ENSG00000197746 prosaposin NA
27122 DKK3 This gene encodes a protein that is a member of the dickkopf family. The secreted protein contains two cysteine rich regions and is involved in embryonic development through its interactions with the Wnt signaling pathway. The expression of this gene is decreased in a variety of cancer cell lines and it may function as a tumor suppressor gene. Alternative splicing results in multiple transcript variants encoding the same protein. ENSG00000050165 dickkopf WNT signaling pathway inhibitor 3 NA
6176 RPLP1 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal phosphoprotein that is a component of the 60S subunit. The protein, which is a functional equivalent of the E. coli L7/L12 ribosomal protein, belongs to the L12P family of ribosomal proteins. It plays an important role in the elongation step of protein synthesis. Unlike most ribosomal proteins, which are basic, the encoded protein is acidic. Its C-terminal end is nearly identical to the C-terminal ends of the ribosomal phosphoproteins P0 and P2. The P1 protein can interact with P0 and P2 to form a pentameric complex consisting of P1 and P2 dimers, and a P0 monomer. The protein is located in the cytoplasm. Two alternatively spliced transcript variants that encode different proteins have been observed. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. ENSG00000137818 ribosomal protein lateral stalk subunit P1 NA
9572 NR1D1 This gene encodes a transcription factor that is a member of the nuclear receptor subfamily 1. The encoded protein is a ligand-sensitive transcription factor that negatively regulates the expression of core clock proteins. In particular this protein represses the circadian clock transcription factor aryl hydrocarbon receptor nuclear translocator-like protein 1 (ARNTL). This protein may also be involved in regulating genes that function in metabolic, inflammatory and cardiovascular processes. ENSG00000126368 nuclear receptor subfamily 1 group D member 1 NA
1277 COL1A1 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. ENSG00000108821 collagen type I alpha 1 NA
23650 TRIM29 The protein encoded by this gene belongs to the TRIM protein family. It has multiple zinc finger motifs and a leucine zipper motif. It has been proposed to form homo- or heterodimers which are involved in nucleic acid binding. Thus, it may act as a transcriptional regulatory factor involved in carcinogenesis and/or differentiation. It may also function in the suppression of radiosensitivity since it is associated with ataxia telangiectasia phenotype. ENSG00000137699 tripartite motif containing 29 NA
1675 CFD This gene encodes a member of the S1, or chymotrypsin, family of serine peptidases. This protease catalyzes the cleavage of factor B, the rate-limiting step of the alternative pathway of complement activation. This protein also functions as an adipokine, a cell signaling protein secreted by adipocytes, which regulates insulin secretion in mice. Mutations in this gene underlie complement factor D deficiency, which is associated with recurrent bacterial meningitis infections in human patients. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate the mature protease. ENSG00000197766 complement factor D NA
1281 COL3A1 This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. ENSG00000168542 collagen type III alpha 1 chain NA
348 APOE The protein encoded by this gene is a major apoprotein of the chylomicron. It binds to a specific liver and peripheral cell receptor, and is essential for the normal catabolism of triglyceride-rich lipoprotein constituents. This gene maps to chromosome 19 in a cluster with the related apolipoprotein C1 and C2 genes. Mutations in this gene result in familial dysbetalipoproteinemia, or type III hyperlipoproteinemia (HLP III), in which increased plasma cholesterol and triglycerides are the consequence of impaired clearance of chylomicron and VLDL remnants. Alternative splicing results in multiple transcript variants. ENSG00000130203 apolipoprotein E NA
7018 TF This gene encodes a glycoprotein with an approximate molecular weight of 76.5 kDa. It is thought to have been created as a result of an ancient gene duplication event that led to generation of homologous C and N-terminal domains each of which binds one ion of ferric iron. The function of this protein is to transport iron from the intestine, reticuloendothelial system, and liver parenchymal cells to all proliferating cells in the body. This protein may also have a physiologic role as granulocyte/pollen-binding protein (GPBP) involved in the removal of certain organic matter and allergens from serum. ENSG00000091513 transferrin NA
7145 TNS1 The protein encoded by this gene localizes to focal adhesions, regions of the plasma membrane where the cell attaches to the extracellular matrix. This protein crosslinks actin filaments and contains a Src homology 2 (SH2) domain, which is often found in molecules involved in signal transduction. This protein is a substrate of calpain II. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000079308 tensin 1 NA
150094 SIK1 NA ENSG00000142178 salt inducible kinase 1 NA
ENSG00000229732 AC019349.5 NA ENSG00000229732 NA NA
2934 GSN The protein encoded by this gene binds to the ‘plus’ ends of actin monomers and filaments to prevent monomer exchange. The encoded calcium-regulated protein functions in both assembly and disassembly of actin filaments. Defects in this gene are a cause of familial amyloidosis Finnish type (FAF). Multiple transcript variants encoding several different isoforms have been found for this gene. ENSG00000148180 gelsolin NA
57699 CPNE5 Calcium-dependent membrane-binding proteins may regulate molecular events at the interface of the cell membrane and cytoplasm. This gene is one of several genes that encode a calcium-dependent protein containing two N-terminal type II C2 domains and an integrin A domain-like sequence in the C-terminus. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene. More variants may exist, but their full-length natures could not be determined. ENSG00000124772 copine 5 NA
28996 HIPK2 This gene encodes a conserved serine/threonine kinase that is a member of the homeodomain-interacting protein kinase family. The encoded protein interacts with homeodomain transcription factors and many other transcription factors such as p53, and can function as both a corepressor and a coactivator depending on the transcription factor and its subcellular localization. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000064393 homeodomain interacting protein kinase 2 NA
55076 TMEM45A NA ENSG00000181458 transmembrane protein 45A NA
604 BCL6 The protein encoded by this gene is a zinc finger transcription factor and contains an N-terminal POZ domain. This protein acts as a sequence-specific repressor of transcription, and has been shown to modulate the transcription of STAT-dependent IL-4 responses of B cells. This protein can interact with a variety of POZ-containing proteins that function as transcription corepressors. This gene is found to be frequently translocated and hypermutated in diffuse large-cell lymphoma (DLCL), and may be involved in the pathogenesis of DLCL. Alternatively spliced transcript variants encoding different protein isoforms have been found for this gene. ENSG00000113916 B-cell CLL/lymphoma 6 NA
171024 SYNPO2 NA ENSG00000172403 synaptopodin 2 NA
ENSG00000265401 RP11-138I1.4 NA ENSG00000265401 NA NA
79957 PAQR6 NA ENSG00000160781 progestin and adipoQ receptor family member 6 NA
2495 FTH1 This gene encodes the heavy subunit of ferritin, the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in ferritin proteins are associated with several neurodegenerative diseases. This gene has multiple pseudogenes. Several alternatively spliced transcript variants have been observed, but their biological validity has not been determined. ENSG00000167996 ferritin heavy chain 1 NA
151516 ASPRV1 NA ENSG00000244617 aspartic peptidase, retroviral-like 1 NA
6095 RORA The protein encoded by this gene is a member of the NR1 subfamily of nuclear hormone receptors. It can bind as a monomer or as a homodimer to hormone response elements upstream of several genes to enhance the expression of those genes. The encoded protein has been shown to interact with NM23-2, a nucleoside diphosphate kinase involved in organogenesis and differentiation, as well as with NM23-1, the product of a tumor metastasis suppressor candidate gene. Also, it has been shown to aid in the transcriptional regulation of some genes involved in circadian rhythm. Four transcript variants encoding different isoforms have been described for this gene. ENSG00000069667 RAR related orphan receptor A NA
6175 RPLP0 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein, which is the functional equivalent of the E. coli L10 ribosomal protein, belongs to the L10P family of ribosomal proteins. It is a neutral phosphoprotein with a C-terminal end that is nearly identical to the C-terminal ends of the acidic ribosomal phosphoproteins P1 and P2. The P0 protein can interact with P1 and P2 to form a pentameric complex consisting of P1 and P2 dimers, and a P0 monomer. The protein is located in the cytoplasm. Transcript variants derived from alternative splicing exist; they encode the same protein. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. ENSG00000089157 ribosomal protein lateral stalk subunit P0 NA
1293 COL6A3 This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. ENSG00000163359 collagen type VI alpha 3 chain NA
79026 AHNAK NA ENSG00000124942 AHNAK nucleoprotein NA
NA NA NA ENSG00000117289 NA TRUE
6280 S100A9 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. ENSG00000163220 S100 calcium binding protein A9 NA
49860 CRNN This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation. ENSG00000143536 cornulin NA
6440 SFTPC This gene encodes the pulmonary-associated surfactant protein C (SPC), an extremely hydrophobic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 2, also called pulmonary alveolar proteinosis due to surfactant protein C deficiency, and are associated with interstitial lung disease in older infants, children, and adults. Alternatively spliced transcript variants encoding different protein isoforms have been identified. ENSG00000168484 surfactant protein C NA
125 ADH1B The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000196616 alcohol dehydrogenase 1B (class I), beta polypeptide NA
3728 JUP This gene encodes a major cytoplasmic protein which is the only known constituent common to submembranous plaques of both desmosomes and intermediate junctions. This protein forms distinct complexes with cadherins and desmosomal cadherins and is a member of the catenin family since it contains a distinct repeating amino acid motif called the armadillo repeat. Mutation in this gene has been associated with Naxos disease. Alternative splicing occurs in this gene; however, not all transcripts have been fully described. ENSG00000173801 junction plakoglobin NA
11067 C10orf10 The expression of this gene is induced by fasting as well as by progesterone. The protein encoded by this gene contains a t-synaptosome-associated protein receptor (SNARE) coiled-coil homology domain and a peroxisomal targeting signal. Production of the encoded protein leads to phosphorylation and activation of the transcription factor ELK1. ENSG00000165507 chromosome 10 open reading frame 10 NA
9638 FEZ1 This gene is an ortholog of the C. elegans unc-76 gene, which is necessary for normal axonal bundling and elongation within axon bundles. Expression of this gene in C. elegans unc-76 mutants can restore to the mutants partial locomotion and axonal fasciculation, suggesting that it also functions in axonal outgrowth. The N-terminal half of the gene product is highly acidic. Alternatively spliced transcript variants encoding different isoforms of this protein have been described. ENSG00000149557 fasciculation and elongation protein zeta 1 NA
8507 ENC1 This gene encodes a member of the kelch-related family of actin-binding proteins. The encoded protein plays a role in the oxidative stress response as a regulator of the transcription factor Nrf2, and expression of this gene may play a role in malignant transformation. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000171617 ectodermal-neural cortex 1 NA
114907 FBXO32 This gene encodes a member of the F-box protein family which is characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of the ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into 3 classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbxs class and contains an F-box domain. This protein is highly expressed during muscle atrophy, whereas mice deficient in this gene were found to be resistant to atrophy. This protein is thus a potential drug target for the treatment of muscle atrophy. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000156804 F-box protein 32 NA
81691 LOC81691 NA ENSG00000005189 exonuclease NEF-sp NA
3315 HSPB1 The protein encoded by this gene is induced by environmental stress and developmental changes. The encoded protein is involved in stress resistance and actin organization and translocates from the cytoplasm to the nucleus upon stress induction. Defects in this gene are a cause of Charcot-Marie-Tooth disease type 2F (CMT2F) and distal hereditary motor neuropathy (dHMN). ENSG00000106211 heat shock protein family B (small) member 1 NA
146225 CMTM2 This gene belongs to the chemokine-like factor gene superfamily, a novel family that links the chemokine and the transmembrane 4 superfamilies of signaling molecules. The protein encoded by this gene may play an important role in testicular development. ENSG00000140932 CKLF like MARVEL transmembrane domain containing 2 NA
6277 S100A6 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in stimulation of Ca2+-dependent insulin release, stimulation of prolactin secretion, and exocytosis. Chromosomal rearrangements and altered expression of this gene have been implicated in melanoma. ENSG00000197956 S100 calcium binding protein A6 NA
1278 COL1A2 This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. ENSG00000164692 collagen type I alpha 2 chain NA
7316 UBC This gene represents a ubiquitin gene, ubiquitin C. The encoded protein is a polyubiquitin precursor. Conjugation of ubiquitin monomers or polymers can lead to various effects within a cell, depending on the residues to which ubiquitin is conjugated. Ubiquitination has been associated with protein degradation, DNA repair, cell cycle regulation, kinase modification, endocytosis, and regulation of other cell signaling pathways. ENSG00000150991 ubiquitin C NA
4666 NACA This gene encodes a protein that associates with basic transcription factor 3 (BTF3) to form the nascent polypeptide-associated complex (NAC). This complex binds to nascent proteins that lack a signal peptide motif as they emerge from the ribosome, blocking interaction with the signal recognition particle (SRP) and preventing mistranslocation to the endoplasmic reticulum. This protein is an IgE autoantigen in atopic dermatitis patients. Alternative splicing results in multiple transcript variants, but the full length nature of some of these variants, including those encoding very large proteins, has not been determined. There are multiple pseudogenes of this gene on different chromosomes. ENSG00000196531 nascent polypeptide-associated complex alpha subunit NA
7070 THY1 This gene encodes a cell surface glycoprotein and member of the immunoglobulin superfamily of proteins. The encoded protein is involved in cell adhesion and cell communication in numerous cell types, but particularly in cells of the immune and nervous systems. The encoded protein is widely used as a marker for hematopoietic stem cells. This gene may function as a tumor suppressor in nasopharyngeal carcinoma. Alternative splicing results in multiple transcript variants. ENSG00000154096 Thy-1 cell surface antigen NA
58476 TP53INP2 NA ENSG00000078804 tumor protein p53 inducible nuclear protein 2 NA
2261 FGFR3 This gene encodes a member of the fibroblast growth factor receptor (FGFR) family, with its amino acid sequence being highly conserved between members and among divergent species. FGFR family members differ from one another in their ligand affinities and tissue distribution. A full-length representative protein would consist of an extracellular region, composed of three immunoglobulin-like domains, a single hydrophobic membrane-spanning segment and a cytoplasmic tyrosine kinase domain. The extracellular portion of the protein interacts with fibroblast growth factors, setting in motion a cascade of downstream signals, ultimately influencing mitogenesis and differentiation. This particular family member binds acidic and basic fibroblast growth hormone and plays a role in bone development and maintenance. Mutations in this gene lead to craniosynostosis and multiple types of skeletal dysplasia. Three alternatively spliced transcript variants that encode different protein isoforms have been described. ENSG00000068078 fibroblast growth factor receptor 3 NA
2355 FOSL2 The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. ENSG00000075426 FOS like 2, AP-1 transcription factor subunit NA
23089 PEG10 This is a paternally expressed imprinted gene that is thought to have been derived from the Ty3/Gypsy family of retrotransposons. It contains two overlapping open reading frames, RF1 and RF2, and expresses two proteins: a shorter, gag-like protein (with a CCHC-type zinc finger domain) from RF1; and a longer, gag/pol-like fusion protein (with an additional aspartic protease motif) from RF1/RF2 by -1 translational frameshifting (-1 FS). While -1 FS has been observed in RNA viruses and transposons in both prokaryotes and eukaryotes, this gene represents the first example of -1 FS in a eukaryotic cellular gene. This gene is highly conserved across mammalian species and retains the heptanucleotide (GGGAAAC) and pseudoknot elements required for -1 FS. It is expressed in adult and embryonic tissues (most notably in placenta) and reported to have a role in cell proliferation, differentiation and apoptosis. Overexpression of this gene has been associated with several malignancies, such as hepatocellular carcinoma and B-cell lymphocytic leukemia. Knockout mice lacking this gene showed early embryonic lethality with placental defects, indicating the importance of this gene in embryonic development. Additional isoforms resulting from alternatively spliced transcript variants, and use of upstream non-AUG (CUG) start codon have been reported for this gene. ENSG00000242265 paternally expressed 10 NA
5339 PLEC Plectin is a prominent member of an important family of structurally and in part functionally related proteins, termed plakins or cytolinkers, that are capable of interlinking different elements of the cytoskeleton. Plakins, with their multi-domain structure and enormous size, not only play crucial roles in maintaining cell and tissue integrity and orchestrating dynamic changes in cytoarchitecture and cell shape, but also serve as scaffolding platforms for the assembly, positioning, and regulation of signaling complexes (reviewed in PMID: 9701547, 11854008, and 17499243). Plectin is expressed as several protein isoforms in a wide range of cell types and tissues from a single gene located on chromosome 8 in humans (PMID: 8633055, 8698233). Until 2010, this locus was named plectin 1 (symbol PLEC1 in human; Plec1 in mouse and rat) and the gene product had been referred to as ‘hemidesmosomal protein 1’ or ‘plectin 1, intermediate filament binding 500kDa’. These names were superseded by plectin. The plectin gene locus in mouse on chromosome 15 has been analyzed in detail (PMID: 10556294, 14559777), revealing a genomic exon-intron organization with well over 40 exons spanning over 62 kb and an unusual 5’ transcript complexity of plectin isoforms. Eleven exons (1-1j) have been identified that alternatively splice directly into a common exon 2 which is the first exon to encode plectin’s highly conserved actin binding domain (ABD). Three additional exons (-1, 0a, and 0) splice into an alternative first coding exon (1c), and two additional exons (2alpha and 3alpha) are optionally spliced within the exons encoding the acting binding domain (exons 2-8). Analysis of the human locus has identified eight of the eleven alternative 5’ exons found in mouse and rat (PMID: 14672974); exons 1i, 1j and 1h have not been confirmed in human. Furthermore, isoforms lacking the central rod domain encoded by exon 31 have been detected in mouse (PMID:10556294), rat (PMID: 9177781), and human (PMID: 11441066, 10780662, 20052759). The short alternative amino-terminal sequences encoded by the different first exons direct the targeting of the various isoforms to distinct subcellular locations (PMID: 14559777). As the expression of specific plectin isoforms was found to be dependent on cell type (tissue) and stage of development (PMID: 10556294, 12542521, 17389230) it appears that each cell type (tissue) contains a unique set (proportion and composition) of plectin isoforms, as if custom-made for specific requirements of the particular cells. Concordantly, individual isoforms were found to carry out distinct and specific functions (PMID: 14559777, 12542521, 18541706). In 1996, a number of groups reported that patients suffering from epidermolysis bullosa simplex with muscular dystrophy (EBS-MD) lacked plectin expression in skin and muscle tissues due to defects in the plectin gene (PMID: 8698233, 8941634, 8636409, 8894687, 8696340). Two other subtypes of plectin-related EBS have been described: EBS-pyloric atresia (PA) and EBS-Ogna. For reviews of plectin-related diseases see PMID: 15810881, 19945614. Mutations in the plectin gene related to human diseases should be named based on the position in NM_000445 (variant 1, isoform 1c), unless the mutation is located within one of the other alternative first exons, in which case the position in the respective Reference Sequence should be used. ENSG00000178209 plectin NA
8407 TAGLN2 The protein encoded by this gene is similar to the protein transgelin, which is one of the earliest markers of differentiated smooth muscle. The specific function of this protein has not yet been determined, although it is thought to be a tumor suppressor. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000158710 transgelin 2 NA
27129 HSPB7 NA ENSG00000173641 heat shock protein family B (small) member 7 NA
80740 LY6G6C LY6G6C belongs to a cluster of leukocyte antigen-6 (LY6) genes located in the major histocompatibility complex (MHC) class III region on chromosome 6. Members of the LY6 superfamily typically contain 70 to 80 amino acids, including 8 to 10 cysteines. Most LY6 proteins are attached to the cell surface by a glycosylphosphatidylinositol (GPI) anchor that is directly involved in signal transduction (Mallya et al., 2002 [PubMed 12079290]). ENSG00000204421 lymphocyte antigen 6 complex, locus G6C NA
1832 DSP This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants. ENSG00000096696 desmoplakin NA
3853 KRT6A The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. As many as six of this type II cytokeratin (KRT6) have been identified; the multiplicity of the genes is attributed to successive gene duplication events. The genes are expressed with family members KRT16 and/or KRT17 in the filiform papillae of the tongue, the stratified epithelial lining of oral mucosa and esophagus, the outer root sheath of hair follicles, and the glandular epithelia. This KRT6 gene in particular encodes the most abundant isoform. Mutations in these genes have been associated with pachyonychia congenita. In addition, peptides from the C-terminal region of the protein have antimicrobial activity against bacterial pathogens. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000205420 keratin 6A NA
729238 SFTPA2 This gene is one of several genes encoding pulmonary-surfactant associated proteins (SFTPA) located on chromosome 10. Mutations in this gene and a highly similar gene located nearby, which affect the highly conserved carbohydrate recognition domain, are associated with idiopathic pulmonary fibrosis. The current version of the assembly displays only a single centromeric SFTPA gene pair rather than the two gene pairs shown in the previous assembly which were thought to have resulted from a duplication. ENSG00000185303 surfactant protein A2 NA
5187 PER1 This gene is a member of the Period family of genes and is expressed in a circadian pattern in the suprachiasmatic nucleus, the primary circadian pacemaker in the mammalian brain. Genes in this family encode components of the circadian rhythms of locomotor activity, metabolism, and behavior. This gene is upregulated by CLOCK/ARNTL heterodimers but then represses this upregulation in a feedback loop using PER/CRY heterodimers to interact with CLOCK/ARNTL. Polymorphisms in this gene may increase the risk of getting certain cancers. Alternative splicing has been observed in this gene; however, these variants have not been fully described. ENSG00000179094 period circadian clock 1 NA
5524 PTPA Protein phosphatase 2A is one of the four major Ser/Thr phosphatases and is implicated in the negative control of cell growth and division. Protein phosphatase 2A holoenzymes are heterotrimeric proteins composed of a structural subunit A, a catalytic subunit C, and a regulatory subunit B. The regulatory subunit is encoded by a diverse set of genes that have been grouped into the B/PR55, B’/PR61, and B’‘/PR72 families. These different regulatory subunits confer distinct enzymatic specificities and intracellular localizations to the holozenzyme. The product of this gene belongs to the B’ family. This gene encodes a specific phosphotyrosyl phosphatase activator of the dimeric form of protein phosphatase 2A. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000119383 protein phosphatase 2 phosphatase activator NA
8751 ADAM15 The protein encoded by this gene is a member of the ADAM (a disintegrin and metalloproteinase) protein family. ADAM family members are type I transmembrane glycoproteins known to be involved in cell adhesion and proteolytic ectodomain processing of cytokines and adhesion molecules. This protein contains multiple functional domains including a zinc-binding metalloprotease domain, a disintegrin-like domain, as well as a EGF-like domain. Through its disintegrin-like domain, this protein specifically interacts with the integrin beta chain, beta 3. It also interacts with Src family protein-tyrosine kinases in a phosphorylation-dependent manner, suggesting that this protein may function in cell-cell adhesion as well as in cellular signaling. Multiple alternatively spliced transcript variants encoding distinct isoforms have been observed. ENSG00000143537 ADAM metallopeptidase domain 15 NA
8848 TSC22D1 This gene encodes a member of the TSC22 domain family of leucine zipper transcription factors. The encoded protein is stimulated by transforming growth factor beta, and regulates the transcription of multiple genes including C-type natriuretic peptide. The encoded protein may play a critical role in tumor suppression through the induction of cancer cell apoptosis, and a single nucleotide polymorphism in the promoter of this gene has been associated with diabetic nephropathy. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000102804 TSC22 domain family member 1 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_sqrt/gene_names_clus_",20,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

GTEx 2013 Factor analysis (sparse loadings: voom counts)

lambda_out <- read.table("../sfa_outputs/GTEX2013/voom_gtex/voom_gtex_sfa_lambda.out");
f_out <- t(read.table("../sfa_outputs/GTEX2013/voom_gtex/voom_gtex_sfa_F.out"));

gene_names <- as.vector(as.matrix(read.table("../sfa_inputs/gene_names_GTEX_V6.txt")));
gene_names <- substring(gene_names,1,15);
xli  <-  gene_names;

indices_mat <- SFA.ExtractTopFeatures(f_out, top_features = 100, options="min", mult.annotate = TRUE)

gene_list <- do.call(rbind, lapply(1:dim(indices_mat)[1], function(x) gene_names[indices_mat[x,]]))

Factor 1 Annotations

out <- mygene::queryMany(gene_list[1,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query X_id name symbol summary notfound
ENSG00000205362 4489 metallothionein 1A MT1A NA NA
ENSG00000116285 54206 ERBB receptor feedback inhibitor 1 ERRFI1 ERRFI1 is a cytoplasmic protein whose expression is upregulated with cell growth (Wick et al., 1995 [PubMed 7641805]). It shares significant homology with the protein product of rat gene-33, which is induced during cell stress and mediates cell signaling (Makkinje et al., 2000 [PubMed 10749885]; Fiorentino et al., 2000 [PubMed 11003669]). NA
ENSG00000136997 4609 v-myc avian myelocytomatosis viral oncogene homolog MYC The protein encoded by this gene is a multifunctional, nuclear phosphoprotein that plays a role in cell cycle progression, apoptosis and cellular transformation. It functions as a transcription factor that regulates transcription of specific target genes. Mutations, overexpression, rearrangement and translocation of this gene have been associated with a variety of hematopoietic tumors, leukemias and lymphomas, including Burkitt lymphoma. There is evidence to show that alternative translation initiations from an upstream, in-frame non-AUG (CUG) and a downstream AUG start site result in the production of two isoforms with distinct N-termini. The synthesis of non-AUG initiated protein is suppressed in Burkitt’s lymphomas, suggesting its importance in the normal function of this gene. NA
ENSG00000081041 2920 C-X-C motif chemokine ligand 2 CXCL2 This antimicrobial gene is part of a chemokine superfamily that encodes secreted proteins involved in immunoregulatory and inflammatory processes. The superfamily is divided into four subfamilies based on the arrangement of the N-terminal cysteine residues of the mature peptide. This chemokine, a member of the CXC subfamily, is expressed at sites of inflammation and may suppress hematopoietic progenitor cell proliferation. NA
ENSG00000179751 342898 syncollin SYCN NA NA
ENSG00000145506 85409 naked cuticle homolog 2 NKD2 This gene encodes a member of a family of proteins that function as negative regulators of Wnt receptor signaling through interaction with Dishevelled family members. The encoded protein participates in the delivery of transforming growth factor alpha-containing vesicles to the cell membrane. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
ENSG00000125740 2354 FosB proto-oncogene, AP-1 transcription factor subunit FOSB The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
ENSG00000175592 8061 FOS like 1, AP-1 transcription factor subunit FOSL1 The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. Several transcript variants encoding different isoforms have been found for this gene. NA
ENSG00000117143 6675 UDP-N-acetylglucosamine pyrophosphorylase 1 UAP1 NA NA
ENSG00000155090 7071 Kruppel like factor 10 KLF10 This gene encodes a member of a family of proteins that feature C2H2-type zinc finger domains. The encoded protein is a transcriptional repressor that acts as an effector of transforming growth factor beta signaling. Activity of this protein may inhibit the growth of cancers, particularly pancreatic cancer. Alternative splicing results in multiple transcript variants. NA
ENSG00000179294 NA NA NA NA TRUE
ENSG00000113739 8614 stanniocalcin 2 STC2 This gene encodes a secreted, homodimeric glycoprotein that is expressed in a wide variety of tissues and may have autocrine or paracrine functions. The encoded protein has 10 of its 15 cysteine residues conserved among stanniocalcin family members and is phosphorylated by casein kinase 2 exclusively on its serine residues. Its C-terminus contains a cluster of histidine residues which may interact with metal ions. The protein may play a role in the regulation of renal and intestinal calcium and phosphate transport, cell metabolism, or cellular calcium/phosphate homeostasis. Constitutive overexpression of human stanniocalcin 2 in mice resulted in pre- and postnatal growth restriction, reduced bone and skeletal muscle growth, and organomegaly. Expression of this gene is induced by estrogen and altered in some breast cancers. NA
ENSG00000119508 8013 nuclear receptor subfamily 4 group A member 3 NR4A3 This gene encodes a member of the steroid-thyroid hormone-retinoid receptor superfamily. The encoded protein may act as a transcriptional activator. The protein can efficiently bind the NGFI-B Response Element (NBRE). Three different versions of extraskeletal myxoid chondrosarcomas (EMCs) are the result of reciprocal translocations between this gene and other genes. The translocation breakpoints are associated with Nuclear Receptor Subfamily 4, Group A, Member 3 (on chromosome 9) and either Ewing Sarcome Breakpoint Region 1 (on chromosome 22), RNA Polymerase II, TATA Box-Binding Protein-Associated Factor, 68-KD (on chromosome 17), or Transcription factor 12 (on chromosome 15). Multiple transcript variants encoding different isoforms have been found for this gene. NA
ENSG00000124145 6385 syndecan 4 SDC4 The protein encoded by this gene is a transmembrane (type I) heparan sulfate proteoglycan that functions as a receptor in intracellular signaling. The encoded protein is found as a homodimer and is a member of the syndecan proteoglycan family. This gene is found on chromosome 20, while a pseudogene has been found on chromosome 22. NA
ENSG00000158516 1358 carboxypeptidase A2 CPA2 Three different forms of human pancreatic procarboxypeptidase A have been isolated. The encoded protein represents the A2 form, which is a monomeric protein with different biochemical properties from the A1 and A3 forms. The A2 form of pancreatic procarboxypeptidase acts on aromatic C-terminal residues and is a secreted protein. NA
ENSG00000020577 23034 sterile alpha motif domain containing 4A SAMD4A Sterile alpha motifs (SAMs) in proteins such as SAMD4A are part of an RNA-binding domain that functions as a posttranscriptional regulator by binding to an RNA sequence motif known as the Smaug recognition element, which was named after the Drosophila Smaug protein (Baez and Boccaccio, 2005 [PubMed 16221671]). NA
ENSG00000171621 80176 splA/ryanodine receptor domain and SOCS box containing 1 SPSB1 NA NA
ENSG00000164761 4982 tumor necrosis factor receptor superfamily member 11b TNFRSF11B The protein encoded by this gene is a member of the TNF-receptor superfamily. This protein is an osteoblast-secreted decoy receptor that functions as a negative regulator of bone resorption. This protein specifically binds to its ligand, osteoprotegerin ligand, both of which are key extracellular regulators of osteoclast development. Studies of the mouse counterpart also suggest that this protein and its ligand play a role in lymph-node organogenesis and vascular calcification. Alternatively spliced transcript variants of this gene have been reported, but their full length nature has not been determined. NA
ENSG00000163874 80149 zinc finger CCCH-type containing 12A ZC3H12A ZC3H12A is an MCP1 (CCL2; MIM 158105)-induced protein that acts as a transcriptional activator and causes cell death of cardiomyocytes, possibly via induction of genes associated with apoptosis. NA
ENSG00000103569 366 aquaporin 9 AQP9 The aquaporins are a family of water-selective membrane channels. This gene encodes a member of a subset of aquaporins called the aquaglyceroporins. This protein allows passage of a broad range of noncharged solutes and also stimulates urea transport and osmotic water permeability. This protein may also facilitate the uptake of glycerol in hepatic tissue . The encoded protein may also play a role in specialized leukocyte functions such as immunological response and bactericidal activity. Alternate splicing results in multiple transcript variants. NA
ENSG00000188522 644815 family with sequence similarity 83 member G FAM83G NA NA
ENSG00000144031 79998 ankyrin repeat domain 53 ANKRD53 NA NA
ENSG00000165732 9188 DEAD-box helicase 21 DDX21 DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a DEAD box protein, which is an antigen recognized by autoimmune antibodies from a patient with watermelon stomach disease. This protein unwinds double-stranded RNA, folds single-stranded RNA, and may play important roles in ribosomal RNA biogenesis, RNA editing, RNA transport, and general transcription. NA
ENSG00000060138 8531 Y-box binding protein 3 YBX3 NA NA
ENSG00000143196 1805 dermatopontin DPT Dermatopontin is an extracellular matrix protein with possible functions in cell-matrix interactions and matrix assembly. The protein is found in various tissues and many of its tyrosine residues are sulphated. Dermatopontin is postulated to modify the behavior of TGF-beta through interaction with decorin. NA
ENSG00000110104 79080 coiled-coil domain containing 86 CCDC86 NA NA
ENSG00000132329 10267 receptor activity modifying protein 1 RAMP1 The protein encoded by this gene is a member of the RAMP family of single-transmembrane-domain proteins, called receptor (calcitonin) activity modifying proteins (RAMPs). RAMPs are type I transmembrane proteins with an extracellular N terminus and a cytoplasmic C terminus. RAMPs are required to transport calcitonin-receptor-like receptor (CRLR) to the plasma membrane. CRLR, a receptor with seven transmembrane domains, can function as either a calcitonin-gene-related peptide (CGRP) receptor or an adrenomedullin receptor, depending on which members of the RAMP family are expressed. In the presence of this (RAMP1) protein, CRLR functions as a CGRP receptor. The RAMP1 protein is involved in the terminal glycosylation, maturation, and presentation of the CGRP receptor to the cell surface. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
ENSG00000182253 23336 synemin SYNM The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene. NA
ENSG00000112208 9532 BCL2 associated athanogene 2 BAG2 BAG proteins compete with Hip for binding to the Hsc70/Hsp70 ATPase domain and promote substrate release. All the BAG proteins have an approximately 45-amino acid BAG domain near the C terminus but differ markedly in their N-terminal regions. The predicted BAG2 protein contains 211 amino acids. The BAG domains of BAG1, BAG2, and BAG3 interact specifically with the Hsc70 ATPase domain in vitro and in mammalian cells. All 3 proteins bind with high affinity to the ATPase domain of Hsc70 and inhibit its chaperone activity in a Hip-repressible manner. NA
ENSG00000255959 ENSG00000255959 NA RP11-804A23.2 NA NA
ENSG00000131462 7283 tubulin gamma 1 TUBG1 This gene encodes a member of the tubulin superfamily. The encoded protein localizes to the centrosome where it binds to microtubules as part of a complex referred to as the gamma-tubulin ring complex. The protein mediates microtubule nucleation and is required for microtubule formation and progression of the cell cycle. A pseudogene of this gene is found on chromosome 7. NA
ENSG00000173641 27129 heat shock protein family B (small) member 7 HSPB7 NA NA
ENSG00000090339 3383 intercellular adhesion molecule 1 ICAM1 This gene encodes a cell surface glycoprotein which is typically expressed on endothelial cells and cells of the immune system. It binds to integrins of type CD11a / CD18, or CD11b / CD18 and is also exploited by Rhinovirus as a receptor. NA
ENSG00000198848 1066 carboxylesterase 1 CES1 This gene encodes a member of the carboxylesterase large family. The family members are responsible for the hydrolysis or transesterification of various xenobiotics, such as cocaine and heroin, and endogenous substrates with ester, thioester, or amide bonds. They may participate in fatty acyl and cholesterol ester metabolism, and may play a role in the blood-brain barrier system. This enzyme is the major liver enzyme and functions in liver drug clearance. Mutations of this gene cause carboxylesterase 1 deficiency. Three transcript variants encoding three different isoforms have been found for this gene. NA
ENSG00000268603 ENSG00000268603 NA RP11-316O14.1 NA NA
ENSG00000114200 590 butyrylcholinesterase BCHE Mutant alleles at the BCHE locus are responsible for suxamethonium sensitivity. Homozygous persons sustain prolonged apnea after administration of the muscle relaxant suxamethonium in connection with surgical anesthesia. The activity of pseudocholinesterase in the serum is low and its substrate behavior is atypical. In the absence of the relaxant, the homozygote is at no known disadvantage. NA
ENSG00000125148 4502 metallothionein 2A MT2A NA NA
ENSG00000205364 4499 metallothionein 1M MT1M This gene encodes a member of the metallothionein superfamily, type 1 family. Metallothioneins have a high content of cysteine residues that bind various heavy metals. These genes are transcriptionally regulated by both heavy metals and glucocorticoids. NA
ENSG00000137124 219 aldehyde dehydrogenase 1 family member B1 ALDH1B1 This protein belongs to the aldehyde dehydrogenases family of proteins. Aldehyde dehydrogenase is the second enzyme of the major oxidative pathway of alcohol metabolism. This gene does not contain introns in the coding sequence. The variation of this locus may affect the development of alcohol-related problems. NA
ENSG00000167034 4824 NK3 homeobox 1 NKX3-1 This gene encodes a homeobox-containing transcription factor. This transcription factor functions as a negative regulator of epithelial cell growth in prostate tissue. Aberrant expression of this gene is associated with prostate tumor progression. Alternate splicing results in multiple transcript variants of this gene. NA
ENSG00000257718 ENSG00000257718 NA RP11-396F22.1 NA NA
ENSG00000108691 6347 C-C motif chemokine ligand 2 CCL2 This gene is one of several cytokine genes clustered on the q-arm of chromosome 17. Chemokines are a superfamily of secreted proteins involved in immunoregulatory and inflammatory processes. The superfamily is divided into four subfamilies based on the arrangement of N-terminal cysteine residues of the mature peptide. This chemokine is a member of the CC subfamily which is characterized by two adjacent cysteine residues. This cytokine displays chemotactic activity for monocytes and basophils but not for neutrophils or eosinophils. It has been implicated in the pathogenesis of diseases characterized by monocytic infiltrates, like psoriasis, rheumatoid arthritis and atherosclerosis. It binds to chemokine receptors CCR2 and CCR4. NA
ENSG00000074219 8463 TEA domain transcription factor 2 TEAD2 NA NA
ENSG00000101447 81610 family with sequence similarity 83 member D FAM83D NA NA
ENSG00000124762 1026 cyclin-dependent kinase inhibitor 1A CDKN1A This gene encodes a potent cyclin-dependent kinase inhibitor. The encoded protein binds to and inhibits the activity of cyclin-cyclin-dependent kinase2 or -cyclin-dependent kinase4 complexes, and thus functions as a regulator of cell cycle progression at G1. The expression of this gene is tightly controlled by the tumor suppressor protein p53, through which this protein mediates the p53-dependent cell cycle G1 phase arrest in response to a variety of stress stimuli. This protein can interact with proliferating cell nuclear antigen, a DNA polymerase accessory factor, and plays a regulatory role in S phase DNA replication and DNA damage repair. This protein was reported to be specifically cleaved by CASP3-like caspases, which thus leads to a dramatic activation of cyclin-dependent kinase2, and may be instrumental in the execution of apoptosis following caspase activation. Mice that lack this gene have the ability to regenerate damaged or missing tissue. Multiple alternatively spliced variants have been found for this gene. NA
ENSG00000168386 11259 filamin A interacting protein 1 like FILIP1L NA NA
ENSG00000118520 383 arginase 1 ARG1 Arginase catalyzes the hydrolysis of arginine to ornithine and urea. At least two isoforms of mammalian arginase exist (types I and II) which differ in their tissue distribution, subcellular localization, immunologic crossreactivity and physiologic function. The type I isoform encoded by this gene, is a cytosolic enzyme and expressed predominantly in the liver as a component of the urea cycle. Inherited deficiency of this enzyme results in argininemia, an autosomal recessive disorder characterized by hyperammonemia. Two transcript variants encoding different isoforms have been found for this gene. NA
ENSG00000102802 84935 mesenteric estrogen dependent adipogenesis MEDAG NA NA
ENSG00000166025 154810 angiomotin like 1 AMOTL1 The protein encoded by this gene is a peripheral membrane protein that is a component of tight junctions or TJs. TJs form an apical junctional structure and act to control paracellular permeability and maintain cell polarity. This protein is related to angiomotin, an angiostatin binding protein that regulates endothelial cell migration and capillary formation. Two transcript variants encoding different isoforms have been found for this gene. NA
ENSG00000154548 135295 serine and arginine rich splicing factor 12 SRSF12 NA NA
ENSG00000254902 ENSG00000254902 ANO1 antisense RNA 1 ANO1-AS1 NA NA
ENSG00000214456 440503 perilipin 5 PLIN5 Members of the perilipin family, such as PLIN5, coat intracellular lipid storage droplets and protect them from lipolytic degradation (Dalen et al., 2007 [PubMed 17234449]). NA
ENSG00000258554 ENSG00000258554 NA RP11-973D8.4 NA NA
ENSG00000130164 3949 low density lipoprotein receptor LDLR The low density lipoprotein receptor (LDLR) gene family consists of cell surface proteins involved in receptor-mediated endocytosis of specific ligands. Low density lipoprotein (LDL) is normally bound at the cell membrane and taken into the cell ending up in lysosomes where the protein is degraded and the cholesterol is made available for repression of microsomal enzyme 3-hydroxy-3-methylglutaryl coenzyme A (HMG CoA) reductase, the rate-limiting step in cholesterol synthesis. At the same time, a reciprocal stimulation of cholesterol ester synthesis takes place. Mutations in this gene cause the autosomal dominant disorder, familial hypercholesterolemia. Alternate splicing results in multiple transcript variants. NA
ENSG00000138709 55132 La ribonucleoprotein domain family member 1B LARP1B This gene encodes a protein containing domains found in the La related protein of Drosophila melanogaster. La motif-containing proteins are thought to be RNA-binding proteins, where the La motif and adjacent amino acids fold into an RNA recognition motif. The La motif is also found in proteins unrelated to the La protein. Alternative splicing has been observed at this locus and multiple variants, encoding distinct isoforms, are described. Additional splice variation has been identified but the full-length nature of these transcripts has not been determined. NA
ENSG00000183655 64410 kelch like family member 25 KLHL25 NA NA
ENSG00000272275 ENSG00000272275 NA RP11-791G15.2 NA NA
ENSG00000134470 3601 interleukin 15 receptor subunit alpha IL15RA This gene encodes a cytokine receptor that specifically binds interleukin 15 (IL15) with high affinity. The receptors of IL15 and IL2 share two subunits, IL2R beta and IL2R gamma. This forms the basis of many overlapping biological activities of IL15 and IL2. The protein encoded by this gene is structurally related to IL2R alpha, an additional IL2-specific alpha subunit necessary for high affinity IL2 binding. Unlike IL2RA, IL15RA is capable of binding IL15 with high affinity independent of other subunits, which suggests distinct roles between IL15 and IL2. This receptor is reported to enhance cell proliferation and expression of apoptosis inhibitor BCL2L1/BCL2-XL and BCL2. Multiple alternatively spliced transcript variants of this gene have been reported. NA
ENSG00000178814 26873 5-oxoprolinase (ATP-hydrolysing) OPLAH The protein encoded by this gene acts as a homodimer, using ATP hydrolysis to catalyze the conversion of 5-oxo-L-proline to L-glutamate. Defects in this gene are a cause of 5-oxoprolinase deficiency (OPLAHD). NA
ENSG00000232815 ENSG00000232815 double homeobox 4 like 50, pseudogene DUX4L50 NA NA
ENSG00000135931 80210 armadillo repeat containing 9 ARMC9 NA NA
ENSG00000250899 ENSG00000250899 NA RP11-253E3.3 NA NA
ENSG00000261616 ENSG00000261616 NA RP11-6O2.3 NA NA
ENSG00000113916 604 B-cell CLL/lymphoma 6 BCL6 The protein encoded by this gene is a zinc finger transcription factor and contains an N-terminal POZ domain. This protein acts as a sequence-specific repressor of transcription, and has been shown to modulate the transcription of STAT-dependent IL-4 responses of B cells. This protein can interact with a variety of POZ-containing proteins that function as transcription corepressors. This gene is found to be frequently translocated and hypermutated in diffuse large-cell lymphoma (DLCL), and may be involved in the pathogenesis of DLCL. Alternatively spliced transcript variants encoding different protein isoforms have been found for this gene. NA
ENSG00000259799 ENSG00000259799 NA RP11-554A11.9 NA NA
ENSG00000148840 23082 peroxisome proliferator-activated receptor gamma, coactivator-related 1 PPRC1 The protein encoded by this gene is similar to PPAR-gamma coactivator 1 (PPARGC1/PGC-1), a protein that can activate mitochondrial biogenesis in part through a direct interaction with nuclear respiratory factor 1 (NRF1). This protein has been shown to interact with NRF1. It is thought to be a functional relative of PPAR-gamma coactivator 1 that activates mitochondrial biogenesis through NRF1 in response to proliferative signals. Alternative splicing results in multiple transcript variants. NA
ENSG00000169583 9022 chloride intracellular channel 3 CLIC3 Chloride channels are a diverse group of proteins that regulate fundamental cellular processes including stabilization of cell membrane potential, transepithelial transport, maintenance of intracellular pH, and regulation of cell volume. Chloride intracellular channel 3 is a member of the p64 family and is predominantly localized in the nucleus and stimulates chloride ion channel activity. In addition, this protein may participate in cellular growth control, based on its association with ERK7, a member of the MAP kinase family. NA
ENSG00000267607 ENSG00000267607 NA CTD-2369P2.8 NA NA
ENSG00000175505 23529 cardiotrophin-like cytokine factor 1 CLCF1 This gene is a member of the glycoprotein (gp)130 cytokine family and encodes cardiotrophin-like cytokine factor 1 (CLCF1). CLCF1 forms a heterodimer complex with cytokine receptor-like factor 1 (CRLF1). This dimer competes with ciliary neurotrophic factor (CNTF) for binding to the ciliary neurotrophic factor receptor (CNTFR) complex, and activates the Jak-STAT signaling cascade. CLCF1 can be actively secreted from cells by forming a complex with soluble type I CRLF1 or soluble CNTFR. CLCF1 is a potent neurotrophic factor, B-cell stimulatory agent and neuroendocrine modulator of pituitary corticotroph function. Defects in CLCF1 cause cold-induced sweating syndrome 2 (CISS2). This syndrome is characterized by a profuse sweating after exposure to cold as well as congenital physical abnormalities of the head and spine. Alternative splicing results in multiple transcript variants encoding distinct isoforms. NA
ENSG00000181026 64782 apoptosis enhancing nuclease AEN NA NA
ENSG00000159388 7832 BTG family member 2 BTG2 The protein encoded by this gene is a member of the BTG/Tob family. This family has structurally related proteins that appear to have antiproliferative properties. This encoded protein is involved in the regulation of the G1/S transition of the cell cycle. NA
ENSG00000187193 4501 metallothionein 1X MT1X NA NA
ENSG00000162878 91461 protein kinase domain containing, cytoplasmic PKDCC NA NA
ENSG00000259827 ENSG00000259827 NA RP11-343H19.2 NA NA
ENSG00000065150 3843 importin 5 IPO5 Nucleocytoplasmic transport, a signal- and energy-dependent process, takes place through nuclear pore complexes embedded in the nuclear envelope. The import of proteins containing a nuclear localization signal (NLS) requires the NLS import receptor, a heterodimer of importin alpha and beta subunits also known as karyopherins. Importin alpha binds the NLS-containing cargo in the cytoplasm and importin beta docks the complex at the cytoplasmic side of the nuclear pore complex. In the presence of nucleoside triphosphates and the small GTP binding protein Ran, the complex moves into the nuclear pore complex and the importin subunits dissociate. Importin alpha enters the nucleoplasm with its passenger protein and importin beta remains at the pore. Interactions between importin beta and the FG repeats of nucleoporins are essential in translocation through the pore complex. The protein encoded by this gene is a member of the importin beta family. NA
ENSG00000123374 1017 cyclin-dependent kinase 2 CDK2 This gene encodes a member of a family of serine/threonine protein kinases that participate in cell cycle regulation. The encoded protein is the catalytic subunit of the cyclin-dependent protein kinase complex, which regulates progression through the cell cycle. Activity of this protein is especially critical during the G1 to S phase transition. This protein associates with and regulated by other subunits of the complex including cyclin A or E, CDK inhibitor p21Cip1 (CDKN1A), and p27Kip1 (CDKN1B). Alternative splicing results in multiple transcript variants. NA
ENSG00000170890 5319 phospholipase A2 group IB PLA2G1B This gene encodes a secreted member of the phospholipase A2 (PLA2) class of enzymes, which is produced by the pancreatic acinar cells. The encoded calcium-dependent enzyme catalyzes the hydrolysis of the sn-2 position of membrane glycerophospholipids to release arachidonic acid (AA) and lysophospholipids. AA is subsequently converted by downstream metabolic enzymes to several bioactive lipophilic compounds (eicosanoids), including prostaglandins (PGs) and leukotrienes (LTs). The enzyme may be involved in several physiological processes including cell contraction, cell proliferation and pathological response. NA
ENSG00000165995 783 calcium voltage-gated channel auxiliary subunit beta 2 CACNB2 This gene encodes a subunit of a voltage-dependent calcium channel protein that is a member of the voltage-gated calcium channel superfamily. The gene product was originally identified as an antigen target in Lambert-Eaton myasthenic syndrome, an autoimmune disorder. Mutations in this gene are associated with Brugada syndrome. Alternatively spliced variants encoding different isoforms have been described. NA
ENSG00000250471 ENSG00000250471 guanine monophosphate synthase pseudogene 1 GMPSP1 NA NA
ENSG00000090621 8761 poly(A) binding protein cytoplasmic 4 PABPC4 Poly(A)-binding proteins (PABPs) bind to the poly(A) tail present at the 3-prime ends of most eukaryotic mRNAs. PABPC4 or IPABP (inducible PABP) was isolated as an activation-induced T-cell mRNA encoding a protein. Activation of T cells increased PABPC4 mRNA levels in T cells approximately 5-fold. PABPC4 contains 4 RNA-binding domains and proline-rich C terminus. PABPC4 is localized primarily to the cytoplasm. It is suggested that PABPC4 might be necessary for regulation of stability of labile mRNA species in activated T cells. PABPC4 was also identified as an antigen, APP1 (activated-platelet protein-1), expressed on thrombin-activated rabbit platelets. PABPC4 may also be involved in the regulation of protein translation in platelets and megakaryocytes or may participate in the binding or stabilization of polyadenylates in platelet dense granules. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
ENSG00000164023 166929 sphingomyelin synthase 2 SGMS2 Sphingomyelin, a major component of cell and Golgi membranes, is made by the transfer of phosphocholine from phosphatidylcholine onto ceramide, with diacylglycerol as a side product. The protein encoded by this gene is an enzyme that catalyzes this reaction primarily at the cell membrane. The synthesis is reversible, and this enzyme can catalyze the reaction in either direction. The encoded protein is required for cell growth. Three transcript variants encoding the same protein have been found for this gene. There is evidence for more variants, but the full-length nature of their transcripts has not been determined. NA
ENSG00000101187 28231 solute carrier organic anion transporter family member 4A1 SLCO4A1 NA NA
ENSG00000178531 404217 cortexin 1 CTXN1 NA NA
ENSG00000109610 6649 superoxide dismutase 3, extracellular SOD3 This gene encodes a member of the superoxide dismutase (SOD) protein family. SODs are antioxidant enzymes that catalyze the conversion of superoxide radicals into hydrogen peroxide and oxygen, which may protect the brain, lungs, and other tissues from oxidative stress. Proteolytic processing of the encoded protein results in the formation of two distinct homotetramers that differ in their ability to interact with the extracellular matrix (ECM). Homotetramers consisting of the intact protein, or type C subunit, exhibit high affinity for heparin and are anchored to the ECM. Homotetramers consisting of a proteolytically cleaved form of the protein, or type A subunit, exhibit low affinity for heparin and do not interact with the ECM. A mutation in this gene may be associated with increased heart disease risk. NA
ENSG00000159208 148523 circadian associated repressor of transcription CIART NA NA
ENSG00000224376 ENSG00000224376 NA AC017104.6 NA NA
ENSG00000122863 9469 carbohydrate sulfotransferase 3 CHST3 This gene encodes an enzyme which catalyzes the sulfation of chondroitin, a proteoglycan found in the extracellular matrix and most cells which is involved in cell migration and differentiation. Mutations in this gene are associated with spondylepiphyseal dysplasia and humerospinal dysostosis. NA
ENSG00000181458 55076 transmembrane protein 45A TMEM45A NA NA
ENSG00000115604 8809 interleukin 18 receptor 1 IL18R1 The protein encoded by this gene is a cytokine receptor that belongs to the interleukin 1 receptor family. This receptor specifically binds interleukin 18 (IL18), and is essential for IL18 mediated signal transduction. IFN-alpha and IL12 are reported to induce the expression of this receptor in NK and T cells. This gene along with four other members of the interleukin 1 receptor family, including IL1R2, IL1R1, ILRL2 (IL-1Rrp2), and IL1RL1 (T1/ST2), form a gene cluster on chromosome 2q. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
ENSG00000176014 84617 tubulin beta 6 class V TUBB6 NA NA
ENSG00000137193 5292 Pim-1 proto-oncogene, serine/threonine kinase PIM1 The protein encoded by this gene belongs to the Ser/Thr protein kinase family, and PIM subfamily. This gene is expressed primarily in B-lymphoid and myeloid cell lines, and is overexpressed in hematopoietic malignancies and in prostate cancer. It plays a role in signal transduction in blood cells, contributing to both cell proliferation and survival, and thus provides a selective advantage in tumorigenesis. Both the human and orthologous mouse genes have been reported to encode two isoforms (with preferential cellular localization) resulting from the use of alternative in-frame translation initiation codons, the upstream non-AUG (CUG) and downstream AUG codons (PMIDs:16186805, 1825810). NA
ENSG00000117479 10560 solute carrier family 19 member 2 SLC19A2 This gene encodes the thiamin transporter protein. Mutations in this gene cause thiamin-responsive megaloblastic anemia syndrome (TRMA), which is an autosomal recessive disorder characterized by diabetes mellitus, megaloblastic anemia and sensorineural deafness. Two transcript variants encoding different isoforms have been found for this gene. NA
ENSG00000237989 101928399 uncharacterized LOC101928399 TCONS_00029157 NA NA
ENSG00000141076 84916 UTP4, small subunit processome component UTP4 This gene encodes a WD40-repeat-containing protein that is localized to the nucleolus. Mutation of this gene causes North American Indian childhood cirrhosis, a severe intrahepatic cholestasis that results in transient neonatal jaundice, and progresses to periportal fibrosis and cirrhosis in childhood and adolescence. Alternative splicing results in multiple transcript variants. NA
ENSG00000110218 24145 pannexin 1 PANX1 The protein encoded by this gene belongs to the innexin family. Innexin family members are the structural components of gap junctions. This protein and pannexin 2 are abundantly expressed in central nerve system (CNS) and are coexpressed in various neuronal populations. Studies in Xenopus oocytes suggest that this protein alone and in combination with pannexin 2 may form cell type-specific gap junctions with distinct properties. NA
ENSG00000115758 4953 ornithine decarboxylase 1 ODC1 This gene encodes the rate-limiting enzyme of the polyamine biosynthesis pathway which catalyzes ornithine to putrescine. The activity level for the enzyme varies in response to growth-promoting stimuli and exhibits a high turnover rate in comparison to other mammalian proteins. Originally localized to both chromosomes 2 and 7, the gene encoding this enzyme has been determined to be located on 2p25, with a pseudogene located on 7q31-qter. Multiple alternatively spliced transcript variants encoding distinct isoforms have been identified. NA
ENSG00000173530 8793 tumor necrosis factor receptor superfamily member 10d TNFRSF10D The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptor contains an extracellular TRAIL-binding domain, a transmembrane domain, and a truncated cytoplamic death domain. This receptor does not induce apoptosis, and has been shown to play an inhibitory role in TRAIL-induced cell apoptosis. NA
ENSG00000204291 1306 collagen type XV alpha 1 chain COL15A1 This gene encodes the alpha chain of type XV collagen, a member of the FACIT collagen family (fibril-associated collagens with interrupted helices). Type XV collagen has a wide tissue distribution but the strongest expression is localized to basement membrane zones so it may function to adhere basement membranes to underlying connective tissue stroma. The proteolytically produced C-terminal fragment of type XV collagen is restin, a potentially antiangiogenic protein that is closely related to endostatin. Mouse studies have shown that collagen XV deficiency is associated with muscle and microvessel deterioration. NA
ENSG00000157110 11030 RNA binding protein with multiple splicing RBPMS This gene encodes a member of the RNA recognition motif family of RNA-binding proteins. The RNA recognition motif is between 80-100 amino acids in length and family members contain one to four copies of the motif. The RNA recognition motif consists of two short stretches of conserved sequence, as well as a few highly conserved hydrophobic residues. The encoded protein has a single, putative RNA recognition motif in its N-terminus. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
ENSG00000112245 7803 protein tyrosine phosphatase type IVA, member 1 PTP4A1 This gene encodes a member of a small class of prenylated protein tyrosine phosphatases (PTPs), which contain a PTP domain and a characteristic C-terminal prenylation motif. The encoded protein is a cell signaling molecule that plays regulatory roles in a variety of cellular processes, including cell proliferation and migration. The protein may also be involved in cancer development and metastasis. This tyrosine phosphatase is a nuclear protein, but may associate with plasma membrane by means of its prenylation motif. Pseudogenes related to this gene are located on chromosomes 1, 2, 5, 7, 11 and X. NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",1,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 2 Annotations

out <- mygene::queryMany(gene_list[2,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id query name summary notfound
CLPS 1208 ENSG00000137392 colipase The protein encoded by this gene is a cofactor needed by pancreatic lipase for efficient dietary lipid hydrolysis. It binds to the C-terminal, non-catalytic domain of lipase, thereby stabilizing an active conformation and considerably increasing the overall hydrophobic binding site. The gene product allows lipase to anchor noncovalently to the surface of lipid micelles, counteracting the destabilizing influence of intestinal bile salts. This cofactor is only expressed in pancreatic acinar cells, suggesting regulation of expression by tissue-specific elements. Three transcript variants encoding different isoforms have been found for this gene. NA
CELA2A 63036 ENSG00000142615 chymotrypsin like elastase family member 2A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Like most of the human elastases, elastase 2A is secreted from the pancreas as a zymogen. In other species, elastase 2A has been shown to preferentially cleave proteins after leucine, methionine, and phenylalanine residues. NA
REG1B 5968 ENSG00000172023 regenerating family member 1 beta This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. NA
CTRC 11330 ENSG00000162438 chymotrypsin C This gene encodes a member of the peptidase S1 family. The encoded protein is a serum calcium-decreasing factor that has chymotrypsin-like protease activity. Alternatively spliced transcript variants have been observed, but their full-length nature has not been determined. NA
SYCN 342898 ENSG00000179751 syncollin NA NA
CTRB2 440387 ENSG00000168928 chymotrypsinogen B2 NA NA
PNLIPRP1 5407 ENSG00000187021 pancreatic lipase related protein 1 NA NA
CELA3B 23436 ENSG00000219073 chymotrypsin like elastase family member 3B Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. NA
PNLIP 5406 ENSG00000175535 pancreatic lipase This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. NA
NA NA ENSG00000250606 NA NA TRUE
RP11-331F4.4 ENSG00000240338 ENSG00000240338 NA NA NA
CELA3A 10136 ENSG00000142789 chymotrypsin like elastase family member 3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. NA
NA NA ENSG00000165862 NA NA TRUE
CPB1 1360 ENSG00000153002 carboxypeptidase B1 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. NA
CTRB1 1504 ENSG00000168925 chymotrypsinogen B1 The protein encoded by this gene is one of a family of serine proteases that is secreted into the gastrointestinal tract as an inactive precursor, which is activated by proteolytic cleavage with trypsin. NA
REG3A 5068 ENSG00000172016 regenerating family member 3 alpha This gene encodes a pancreatic secretory protein that may be involved in cell proliferation or differentiation. It has similarity to the C-type lectin superfamily. The enhanced expression of this gene is observed during pancreatic inflammation and liver carcinogenesis. The mature protein also functions as an antimicrobial protein with antibacterial activity. Alternate splicing results in multiple transcript variants that encode the same protein. NA
AMY2A 279 ENSG00000243480 amylase, alpha 2A (pancreatic) This gene encodes a member of the alpha-amylase family of proteins. Amylases are secreted proteins that hydrolyze 1,4-alpha-glucoside bonds in oligosaccharides and polysaccharides, catalyzing the first step in digestion of dietary starch and glycogen. This gene and several family members are present in a gene cluster on chromosome 1. This gene encodes an amylase isoenzyme produced by the pancreas. NA
CPA1 1357 ENSG00000091704 carboxypeptidase A1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. NA
PRSS1 5644 ENSG00000204983 protease, serine 1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. NA
CELA2B 51032 ENSG00000215704 chymotrypsin like elastase family member 2B Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Like most of the human elastases, elastase 2B is secreted from the pancreas as a zymogen. In other species, elastase 2B has been shown to preferentially cleave proteins after leucine, methionine, and phenylalanine residues. NA
GP2 2813 ENSG00000169347 glycoprotein 2 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. NA
PLA2G1B 5319 ENSG00000170890 phospholipase A2 group IB This gene encodes a secreted member of the phospholipase A2 (PLA2) class of enzymes, which is produced by the pancreatic acinar cells. The encoded calcium-dependent enzyme catalyzes the hydrolysis of the sn-2 position of membrane glycerophospholipids to release arachidonic acid (AA) and lysophospholipids. AA is subsequently converted by downstream metabolic enzymes to several bioactive lipophilic compounds (eicosanoids), including prostaglandins (PGs) and leukotrienes (LTs). The enzyme may be involved in several physiological processes including cell contraction, cell proliferation and pathological response. NA
AMBP 259 ENSG00000106927 alpha-1-microglobulin/bikunin precursor This gene encodes a complex glycoprotein secreted in plasma. The precursor is proteolytically processed into distinct functioning proteins: alpha-1-microglobulin, which belongs to the superfamily of lipocalin transport proteins and may play a role in the regulation of inflammatory processes, and bikunin, which is a urinary trypsin inhibitor belonging to the superfamily of Kunitz-type protease inhibitors and plays an important role in many physiological and pathological processes. This gene is located on chromosome 9 in a cluster of lipocalin genes. NA
CD44 960 ENSG00000026508 CD44 molecule (Indian blood group) The protein encoded by this gene is a cell-surface glycoprotein involved in cell-cell interactions, cell adhesion and migration. It is a receptor for hyaluronic acid (HA) and can also interact with other ligands, such as osteopontin, collagens, and matrix metalloproteinases (MMPs). This protein participates in a wide variety of cellular functions including lymphocyte activation, recirculation and homing, hematopoiesis, and tumor metastasis. Transcripts for this gene undergo complex alternative splicing that results in many functionally distinct isoforms, however, the full length nature of some of these variants has not been determined. Alternative splicing is the basis for the structural and functional diversity of this protein, and may be related to tumor metastasis. NA
MT1G 4495 ENSG00000125144 metallothionein 1G NA NA
ADIRF-AS1 ENSG00000272734 ENSG00000272734 ADIRF antisense RNA 1 NA NA
GDF15 9518 ENSG00000130513 growth differentiation factor 15 The protein encoded by this gene belongs to the transforming growth factor-beta (TGF-beta) family. The protein is expressed in a broad range of cell types, acts as a pleiotropic cytokine and is involved in the stress reponse program of cells after cellular injury. Increased protein levels are associated with disease states such as tissue hypoxia, inflammation, acute injury and oxidative stress. NA
CPA2 1358 ENSG00000158516 carboxypeptidase A2 Three different forms of human pancreatic procarboxypeptidase A have been isolated. The encoded protein represents the A2 form, which is a monomeric protein with different biochemical properties from the A1 and A3 forms. The A2 form of pancreatic procarboxypeptidase acts on aromatic C-terminal residues and is a secreted protein. NA
RP1-68D18.4 ENSG00000255443 ENSG00000255443 NA NA NA
ARHGEF28 64283 ENSG00000214944 Rho guanine nucleotide exchange factor 28 This gene encodes a member of the Rho guanine nucleotide exchange factor family. The encoded protein interacts with low molecular weight neurofilament mRNA and may be involved in the formation of amyotrophic lateral sclerosis neurofilament aggregates. Alternate splicing results in multiple transcript variants. NA
TMEM52 339456 ENSG00000178821 transmembrane protein 52 NA NA
RP11-862L9.3 ENSG00000266844 ENSG00000266844 NA NA NA
FAM174B 400451 ENSG00000185442 family with sequence similarity 174 member B NA NA
ALB 213 ENSG00000163631 albumin Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. NA
GLIS3 169792 ENSG00000107249 GLIS family zinc finger 3 This gene is a member of the GLI-similar zinc finger protein family and encodes a nuclear protein with five C2H2-type zinc finger domains. This protein functions as both a repressor and activator of transcription and is specifically involved in the development of pancreatic beta cells, the thyroid, eye, liver and kidney. Mutations in this gene have been associated with neonatal diabetes and congenital hypothyroidism (NDH). Alternatively spliced variants that encode different protein isoforms have been described but the full-length nature of only two have been determined. NA
DLK1 8788 ENSG00000185559 delta like non-canonical Notch ligand 1 This gene encodes a transmembrane protein that contains multiple epidermal growth factor repeats that functions as a regulator of cell growth. The encoded protein is involved in the differentiation of several cell types including adipocytes. This gene is located in a region of chromosome 14 frequently showing unparental disomy, and is imprinted and expressed from the paternal allele. A single nucleotide variant in this gene is associated with child and adolescent obesity and shows polar overdominance, where heterozygotes carrying an active paternal allele express the phenotype, while mutant homozygotes are normal. NA
PDGFD 80310 ENSG00000170962 platelet derived growth factor D The protein encoded by this gene is a member of the platelet-derived growth factor family. The four members of this family are mitogenic factors for cells of mesenchymal origin and are characterized by a core motif of eight cysteines, seven of which are found in this factor. This gene product only forms homodimers and, therefore, does not dimerize with the other three family members. It differs from alpha and beta members of this family in having an unusual N-terminal domain, the CUB domain. Two splice variants have been identified for this gene. NA
XBP1 7494 ENSG00000100219 X-box binding protein 1 This gene encodes a transcription factor that regulates MHC class II genes by binding to a promoter element referred to as an X box. This gene product is a bZIP protein, which was also identified as a cellular transcription factor that binds to an enhancer in the promoter of the T cell leukemia virus type 1 promoter. It may increase expression of viral proteins by acting as the DNA binding partner of a viral transactivator. It has been found that upon accumulation of unfolded proteins in the endoplasmic reticulum (ER), the mRNA of this gene is processed to an active form by an unconventional splicing mechanism that is mediated by the endonuclease inositol-requiring enzyme 1 (IRE1). The resulting loss of 26 nt from the spliced mRNA causes a frame-shift and an isoform XBP1(S), which is the functionally active transcription factor. The isoform encoded by the unspliced mRNA, XBP1(U), is constitutively expressed, and thought to function as a negative feedback regulator of XBP1(S), which shuts off transcription of target genes during the recovery phase of ER stress. A pseudogene of XBP1 has been identified and localized to chromosome 5. NA
LOC100506314 100506314 ENSG00000247498 uncharacterized LOC100506314 NA NA
NUPR1 26471 ENSG00000176046 nuclear protein 1, transcriptional regulator NA NA
ARG2 384 ENSG00000081181 arginase 2 Arginase catalyzes the hydrolysis of arginine to ornithine and urea. At least two isoforms of mammalian arginase exists (types I and II) which differ in their tissue distribution, subcellular localization, immunologic crossreactivity and physiologic function. The type II isoform encoded by this gene, is located in the mitochondria and expressed in extra-hepatic tissues, especially kidney. The physiologic role of this isoform is poorly understood; it is thought to play a role in nitric oxide and polyamine metabolism. Transcript variants of the type II gene resulting from the use of alternative polyadenylation sites have been described. NA
SNHG25 ENSG00000266402 ENSG00000266402 small nucleolar RNA host gene 25 NA NA
CYP27B1 1594 ENSG00000111012 cytochrome P450 family 27 subfamily B member 1 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. The protein encoded by this gene localizes to the inner mitochondrial membrane where it hydroxylates 25-hydroxyvitamin D3 at the 1alpha position. This reaction synthesizes 1alpha,25-dihydroxyvitamin D3, the active form of vitamin D3, which binds to the vitamin D receptor and regulates calcium metabolism. Thus this enzyme regulates the level of biologically active vitamin D and plays an important role in calcium homeostasis. Mutations in this gene can result in vitamin D-dependent rickets type I. NA
EIF4EBP1 1978 ENSG00000187840 eukaryotic translation initiation factor 4E binding protein 1 This gene encodes one member of a family of translation repressor proteins. The protein directly interacts with eukaryotic translation initiation factor 4E (eIF4E), which is a limiting component of the multisubunit complex that recruits 40S ribosomal subunits to the 5’ end of mRNAs. Interaction of this protein with eIF4E inhibits complex assembly and represses translation. This protein is phosphorylated in response to various signals including UV irradiation and insulin signaling, resulting in its dissociation from eIF4E and activation of mRNA translation. NA
TTR 7276 ENSG00000118271 transthyretin This gene encodes transthyretin, one of the three prealbumins including alpha-1-antitrypsin, transthyretin and orosomucoid. Transthyretin is a carrier protein; it transports thyroid hormones in the plasma and cerebrospinal fluid, and also transports retinol (vitamin A) in the plasma. The protein consists of a tetramer of identical subunits. More than 80 different mutations in this gene have been reported; most mutations are related to amyloid deposition, affecting predominantly peripheral nerve and/or the heart, and a small portion of the gene mutations is non-amyloidogenic. The diseases caused by mutations include amyloidotic polyneuropathy, euthyroid hyperthyroxinaemia, amyloidotic vitreous opacities, cardiomyopathy, oculoleptomeningeal amyloidosis, meningocerebrovascular amyloidosis, carpal tunnel syndrome, etc. NA
TMC4 147798 ENSG00000167608 transmembrane channel like 4 NA NA
REG1A 5967 ENSG00000115386 regenerating family member 1 alpha This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. NA
SDC1 6382 ENSG00000115884 syndecan 1 The protein encoded by this gene is a transmembrane (type I) heparan sulfate proteoglycan and is a member of the syndecan proteoglycan family. The syndecans mediate cell binding, cell signaling, and cytoskeletal organization and syndecan receptors are required for internalization of the HIV-1 tat protein. The syndecan-1 protein functions as an integral membrane protein and participates in cell proliferation, cell migration and cell-matrix interactions via its receptor for extracellular matrix proteins. Altered syndecan-1 expression has been detected in several different tumor types. While several transcript variants may exist for this gene, the full-length natures of only two have been described to date. These two represent the major variants of this gene and encode the same protein. NA
PIGHP1 ENSG00000259657 ENSG00000259657 phosphatidylinositol glycan anchor biosynthesis class H pseudogene 1 NA NA
GNMT 27232 ENSG00000124713 glycine N-methyltransferase The protein encoded by this gene is an enzyme that catalyzes the conversion of S-adenosyl-L-methionine (along with glycine) to S-adenosyl-L-homocysteine and sarcosine. This protein is found in the cytoplasm and acts as a homotetramer. Defects in this gene are a cause of GNMT deficiency (hypermethioninemia). Alternative splicing results in multiple transcript variants. Naturally occurring readthrough transcription occurs between the upstream CNPY3 (canopy FGF signaling regulator 3) gene and this gene and is represented with GeneID:107080644. NA
TACSTD2 4070 ENSG00000184292 tumor-associated calcium signal transducer 2 This intronless gene encodes a carcinoma-associated antigen. This antigen is a cell surface receptor that transduces calcium signals. Mutations of this gene have been associated with gelatinous drop-like corneal dystrophy. NA
RP11-534L20.4 ENSG00000234981 ENSG00000234981 NA NA NA
RP11-421L21.2 ENSG00000235795 ENSG00000235795 NA NA NA
RP11-173B14.4 ENSG00000228444 ENSG00000228444 NA NA NA
HHEX 3087 ENSG00000152804 hematopoietically expressed homeobox This gene encodes a member of the homeobox family of transcription factors, many of which are involved in developmental processes. Expression in specific hematopoietic lineages suggests that this protein may play a role in hematopoietic differentiation. NA
HIGD1B 51751 ENSG00000131097 HIG1 hypoxia inducible domain family member 1B This gene encodes a member of the hypoxia inducible gene 1 (HIG1) domain family. The encoded protein is localized to the cell membrane and has been linked to tumorigenesis and the progression of pituitary adenomas. Alternative splicing results in multiple transcript variants. NA
GLB1L2 89944 ENSG00000149328 galactosidase beta 1 like 2 NA NA
SPINK1 6690 ENSG00000164266 serine peptidase inhibitor, Kazal type 1 The protein encoded by this gene is a trypsin inhibitor, which is secreted from pancreatic acinar cells into pancreatic juice. It is thought to function in the prevention of trypsin-catalyzed premature activation of zymogens within the pancreas and the pancreatic duct. Mutations in this gene are associated with hereditary pancreatitis and tropical calcific pancreatitis. NA
NPM2 10361 ENSG00000158806 nucleophosmin/nucleoplasmin 2 NA NA
TFPI2 7980 ENSG00000105825 tissue factor pathway inhibitor 2 This gene encodes a member of the Kunitz-type serine proteinase inhibitor family. The protein can inhibit a variety of serine proteases including factor VIIa/tissue factor, factor Xa, plasmin, trypsin, chymotryspin and plasma kallikrein. This gene has been identified as a tumor suppressor gene in several types of cancer. Alternative splicing results in multiple transcript variants. NA
TNFRSF12A 51330 ENSG00000006327 tumor necrosis factor receptor superfamily member 12A NA NA
SLC39A11 201266 ENSG00000133195 solute carrier family 39 member 11 NA NA
EEF1A1P9 ENSG00000249264 ENSG00000249264 eukaryotic translation elongation factor 1 alpha 1 pseudogene 9 NA NA
VAMP8 8673 ENSG00000118640 vesicle associated membrane protein 8 This gene encodes an integral membrane protein that belongs to the synaptobrevin/vesicle-associated membrane protein subfamily of soluble N-ethylmaleimide-sensitive factor attachment protein receptors (SNAREs). The encoded protein is involved in the fusion of synaptic vesicles with the presynaptic membrane. NA
TRAF3IP2 10758 ENSG00000056972 TRAF3 interacting protein 2 This gene encodes a protein involved in regulating responses to cytokines by members of the Rel/NF-kappaB transcription factor family. These factors play a central role in innate immunity in response to pathogens, inflammatory signals and stress. This gene product interacts with TRAF proteins (tumor necrosis factor receptor-associated factors) and either I-kappaB kinase or MAP kinase to activate either NF-kappaB or Jun kinase. Several alternative transcripts encoding different isoforms have been identified. Another transcript, which does not encode a protein and is transcribed in the opposite orientation, has been identified. Overexpression of this transcript has been shown to reduce expression of at least one of the protein encoding transcripts, suggesting it has a regulatory role in the expression of this gene. NA
ZNF215 7762 ENSG00000149054 zinc finger protein 215 NA NA
THBS4 7060 ENSG00000113296 thrombospondin 4 The protein encoded by this gene belongs to the thrombospondin protein family. Thrombospondin family members are adhesive glycoproteins that mediate cell-to-cell and cell-to-matrix interactions. This protein forms a pentamer and can bind to heparin and calcium. It is involved in local signaling in the developing and adult nervous system, and it contributes to spinal sensitization and neuropathic pain states. This gene is activated during the stromal response to invasive breast cancer. It may also play a role in inflammatory responses in Alzheimer’s disease. Alternative splicing results in multiple transcript variants. NA
AEN 64782 ENSG00000181026 apoptosis enhancing nuclease NA NA
RAB11FIP1 80223 ENSG00000156675 RAB11 family interacting protein 1 This gene encodes one of the Rab11-family interacting proteins (Rab11-FIPs), which play a role in the Rab-11 mediated recycling of vesicles. The encoded protein may be involved in endocytic sorting, trafficking of proteins including integrin subunits and epidermal growth factor receptor (EGFR), and transport between the recycling endosome and the trans-Golgi network. Alternative splicing results in multiple transcript variants. A pseudogene is described on the X chromosome. NA
SLC39A14 23516 ENSG00000104635 solute carrier family 39 member 14 Zinc is an essential cofactor for hundreds of enzymes. It is involved in protein, nucleic acid, carbohydrate, and lipid metabolism, as well as in the control of gene transcription, growth, development, and differentiation. SLC39A14 belongs to a subfamily of proteins that show structural characteristics of zinc transporters (Taylor and Nicholson, 2003 [PubMed 12659941]). NA
TCIRG1 10312 ENSG00000110719 T-cell immune regulator 1, ATPase H+ transporting V0 subunit a3 Through alternate splicing, this gene encodes two proteins with similarity to subunits of the vacuolar ATPase (V-ATPase) but the encoded proteins seem to have different functions. V-ATPase is a multisubunit enzyme that mediates acidification of eukaryotic intracellular organelles. V-ATPase dependent organelle acidification is necessary for such intracellular processes as protein sorting, zymogen activation, and receptor-mediated endocytosis. V-ATPase is comprised of a cytosolic V1 domain and a transmembrane V0 domain. Mutations in this gene are associated with infantile malignant osteopetrosis. NA
NA NA ENSG00000225410 NA NA TRUE
G0S2 50486 ENSG00000123689 G0/G1 switch 2 NA NA
METTL1 4234 ENSG00000037897 methyltransferase like 1 This gene is similar in sequence to the S. cerevisiae YDL201w gene. The gene product contains a conserved S-adenosylmethionine-binding motif and is inactivated by phosphorylation. Alternative splice variants encoding different protein isoforms have been described for this gene. A pseudogene has been identified on chromosome X. NA
KIAA0922 23240 ENSG00000121210 KIAA0922 NA NA
ACTG2 72 ENSG00000163017 actin, gamma 2, smooth muscle, enteric Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. NA
RP11-713M15.2 ENSG00000272502 ENSG00000272502 NA NA NA
TRIM5 85363 ENSG00000132256 tripartite motif containing 5 The protein encoded by this gene is a member of the tripartite motif (TRIM) family. The TRIM motif includes three zinc-binding domains, a RING, a B-box type 1 and a B-box type 2, and a coiled-coil region. The protein forms homo-oligomers via the coilel-coil region and localizes to cytoplasmic bodies. It appears to function as a E3 ubiquitin-ligase and ubiqutinates itself to regulate its subcellular localization. It may play a role in retroviral restriction. Multiple alternatively spliced transcript variants encoding different isoforms have been described for this gene. NA
MATN1-AS1 100129196 ENSG00000186056 MATN1 antisense RNA 1 NA NA
ANPEP 290 ENSG00000166825 alanyl aminopeptidase, membrane Aminopeptidase N is located in the small-intestinal and renal microvillar membrane, and also in other plasma membranes. In the small intestine aminopeptidase N plays a role in the final digestion of peptides generated from hydrolysis of proteins by gastric and pancreatic proteases. Its function in proximal tubular epithelial cells and other cell types is less clear. The large extracellular carboxyterminal domain contains a pentapeptide consensus sequence characteristic of members of the zinc-binding metalloproteinase superfamily. Sequence comparisons with known enzymes of this class showed that CD13 and aminopeptidase N are identical. The latter enzyme was thought to be involved in the metabolism of regulatory peptides by diverse cell types, including small intestinal and renal tubular epithelial cells, macrophages, granulocytes, and synaptic membranes from the CNS. Human aminopeptidase N is a receptor for one strain of human coronavirus that is an important cause of upper respiratory tract infections. Defects in this gene appear to be a cause of various types of leukemia or lymphoma. NA
NEAT1 283131 ENSG00000245532 nuclear paraspeckle assembly transcript 1 (non-protein coding) This gene produces a long non-coding RNA (lncRNA) transcribed from the multiple endocrine neoplasia locus. This lncRNA is retained in the nucleus where it forms the core structural component of the paraspeckle sub-organelles. It may act as a transcriptional regulator for numerous genes, including some genes involved in cancer progression. NA
RP5-1148A21.3 ENSG00000266680 ENSG00000266680 NA NA NA
FAM134B 54463 ENSG00000154153 family with sequence similarity 134 member B The protein encoded by this gene is a cis-Golgi transmembrane protein that may be necessary for the long-term survival of nociceptive and autonomic ganglion neurons. Mutations in this gene are a cause of hereditary sensory and autonomic neuropathy type IIB (HSAN IIB), and this gene may also play a role in susceptibility to vascular dementia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
C12orf45 121053 ENSG00000151131 chromosome 12 open reading frame 45 NA NA
SCARNA2 677766 ENSG00000270066 small Cajal body-specific RNA 2 NA NA
FARP1-AS1 ENSG00000231194 ENSG00000231194 FARP1 antisense RNA 1 NA NA
TNFRSF19 55504 ENSG00000127863 tumor necrosis factor receptor superfamily member 19 The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptor is highly expressed during embryonic development. It has been shown to interact with TRAF family members, and to activate JNK signaling pathway when overexpressed in cells. This receptor is capable of inducing apoptosis by a caspase-independent mechanism, and it is thought to play an essential role in embryonic development. Alternatively spliced transcript variants encoding distinct isoforms have been described. NA
ERRFI1 54206 ENSG00000116285 ERBB receptor feedback inhibitor 1 ERRFI1 is a cytoplasmic protein whose expression is upregulated with cell growth (Wick et al., 1995 [PubMed 7641805]). It shares significant homology with the protein product of rat gene-33, which is induced during cell stress and mediates cell signaling (Makkinje et al., 2000 [PubMed 10749885]; Fiorentino et al., 2000 [PubMed 11003669]). NA
MPP6 51678 ENSG00000105926 membrane palmitoylated protein 6 Members of the peripheral membrane-associated guanylate kinase (MAGUK) family function in tumor suppression and receptor clustering by forming multiprotein complexes containing distinct sets of transmembrane, cytoskeletal, and cytoplasmic signaling proteins. All MAGUKs contain a PDZ-SH3-GUK core and are divided into 4 subfamilies, DLG-like (see DLG1; MIM 601014), ZO1-like (see TJP1; MIM 601009), p55-like (see MPP1; MIM 305360), and LIN2-like (see CASK; MIM 300172), based on their size and the presence of additional domains. MPP6 is a member of the p55-like MAGUK subfamily (Tseng et al., 2001 [PubMed 11311936]). NA
CDC20P1 ENSG00000231007 ENSG00000231007 cell division cycle 20 pseudogene 1 NA NA
EMID1 129080 ENSG00000186998 EMI domain containing 1 NA NA
ADGRG1 9289 ENSG00000205336 adhesion G protein-coupled receptor G1 This gene encodes a member of the G protein-coupled receptor family and regulates brain cortical patterning. The encoded protein binds specifically to transglutaminase 2, a component of tissue and tumor stroma implicated as an inhibitor of tumor progression. Mutations in this gene are associated with a brain malformation known as bilateral frontoparietal polymicrogyria. Alternative splicing results in multiple transcript variants. NA
MTHFD2 10797 ENSG00000065911 methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 2, methenyltetrahydrofolate cyclohydrolase This gene encodes a nuclear-encoded mitochondrial bifunctional enzyme with methylenetetrahydrofolate dehydrogenase and methenyltetrahydrofolate cyclohydrolase activities. The enzyme functions as a homodimer and is unique in its absolute requirement for magnesium and inorganic phosphate. Formation of the enzyme-magnesium complex allows binding of NAD. Alternative splicing results in two different transcripts, one protein-coding and the other not protein-coding. This gene has a pseudogene on chromosome 7. NA
FGF18 8817 ENSG00000156427 fibroblast growth factor 18 The protein encoded by this gene is a member of the fibroblast growth factor (FGF) family. FGF family members possess broad mitogenic and cell survival activities, and are involved in a variety of biological processes, including embryonic development, cell growth, morphogenesis, tissue repair, tumor growth, and invasion. It has been shown in vitro that this protein is able to induce neurite outgrowth in PC12 cells. Studies of the similar proteins in mouse and chick suggested that this protein is a pleiotropic growth factor that stimulates proliferation in a number of tissues, most notably the liver and small intestine. Knockout studies of the similar gene in mice implied the role of this protein in regulating proliferation and differentiation of midline cerebellar structures. NA
ZNF321P 399669 ENSG00000213801 zinc finger protein 321, pseudogene NA NA
CASP4 837 ENSG00000196954 caspase 4 This gene encodes a protein that is a member of the cysteine-aspartic acid protease (caspase) family. Sequential activation of caspases plays a central role in the execution-phase of cell apoptosis. Caspases exist as inactive proenzymes composed of a prodomain and a large and small protease subunit. Activation of caspases requires proteolytic processing at conserved internal aspartic residues to generate a heterodimeric enzyme consisting of the large and small subunits. This caspase is able to cleave and activate its own precursor protein, as well as caspase 1 precursor. When overexpressed, this gene induces cell apoptosis. Alternative splicing results in transcript variants encoding distinct isoforms. NA
CTC-301O7.4 ENSG00000197813 ENSG00000197813 NA NA NA
CAPG 822 ENSG00000042493 capping actin protein, gelsolin like This gene encodes a member of the gelsolin/villin family of actin-regulatory proteins. The encoded protein reversibly blocks the barbed ends of F-actin filaments in a Ca2+ and phosphoinositide-regulated manner, but does not sever preformed actin filaments. By capping the barbed ends of actin filaments, the encoded protein contributes to the control of actin-based motility in non-muscle cells. Alternatively spliced transcript variants have been observed for this gene. NA
TOM1L1 10040 ENSG00000141198 target of myb1 like 1 membrane trafficking protein NA NA
TMBIM1 64114 ENSG00000135926 transmembrane BAX inhibitor motif containing 1 NA NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",2,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 3 Annotations

out <- mygene::queryMany(gene_list[3,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
name X_id summary symbol query
dual specificity phosphatase 4 1846 The protein encoded by this gene is a member of the dual specificity protein phosphatase subfamily. These phosphatases inactivate their target kinases by dephosphorylating both the phosphoserine/threonine and phosphotyrosine residues. They negatively regulate members of the mitogen-activated protein (MAP) kinase superfamily (MAPK/ERK, SAPK/JNK, p38), which are associated with cellular proliferation and differentiation. Different members of the family of dual specificity phosphatases show distinct substrate specificities for various MAP kinases, different tissue distribution and subcellular localization, and different modes of inducibility of their expression by extracellular stimuli. This gene product inactivates ERK1, ERK2 and JNK, is expressed in a variety of tissues, and is localized in the nucleus. Two alternatively spliced transcript variants, encoding distinct isoforms, have been observed for this gene. In addition, multiple polyadenylation sites have been reported. DUSP4 ENSG00000120875
UL16 binding protein 2 80328 This gene encodes a major histocompatibility complex (MHC) class I-related molecule that binds to the NKG2D receptor on natural killer (NK) cells to trigger release of multiple cytokines and chemokines that in turn contribute to the recruitment and activation of NK cells. The encoded protein undergoes further processing to generate the mature protein that is either anchored to membrane via a glycosylphosphatidylinositol moiety, or secreted. Many malignant cells secrete the encoded protein to evade immunosurveillance by NK cells. This gene is located in a cluster of multiple MHC class I-related genes on chromosome 6. ULBP2 ENSG00000131015
VLDLR antisense RNA 1 401491 NA VLDLR-AS1 ENSG00000236404
homer scaffolding protein 2 9455 This gene encodes a member of the homer family of dendritic proteins. Members of this family regulate group 1 metabotrophic glutamate receptor function. The encoded protein is a postsynaptic density scaffolding protein. Alternative splicing results in multiple transcript variants. Two related pseudogenes have been identified on chromosome 14. HOMER2 ENSG00000103942
podoplanin 10630 This gene encodes a type-I integral membrane glycoprotein with diverse distribution in human tissues. The physiological function of this protein may be related to its mucin-type character. The homologous protein in other species has been described as a differentiation antigen and influenza-virus receptor. The specific function of this protein has not been determined but it has been proposed as a marker of lung injury. Alternatively spliced transcript variants encoding different isoforms have been identified. PDPN ENSG00000162493
glutamic pyruvate transaminase (alanine aminotransferase) 2 84706 This gene encodes a mitochondrial alanine transaminase, a pyridoxal enzyme that catalyzes the reversible transamination between alanine and 2-oxoglutarate to generate pyruvate and glutamate. Alanine transaminases play roles in gluconeogenesis and amino acid metabolism in many tissues including skeletal muscle, kidney, and liver. Activating transcription factor 4 upregulates this gene under metabolic stress conditions in hepatocyte cell lines. A loss of function mutation in this gene has been associated with developmental encephalopathy. Alternative splicing results in multiple transcript variants. GPT2 ENSG00000166123
dual specificity phosphatase 5 1847 The protein encoded by this gene is a member of the dual specificity protein phosphatase subfamily. These phosphatases inactivate their target kinases by dephosphorylating both the phosphoserine/threonine and phosphotyrosine residues. They negatively regulate members of the mitogen-activated protein (MAP) kinase superfamily (MAPK/ERK, SAPK/JNK, p38), which are associated with cellular proliferation and differentiation. Different members of the family of dual specificity phosphatases show distinct substrate specificities for various MAP kinases, different tissue distribution and subcellular localization, and different modes of inducibility of their expression by extracellular stimuli. This gene product inactivates ERK1, is expressed in a variety of tissues with the highest levels in pancreas and brain, and is localized in the nucleus. DUSP5 ENSG00000138166
ring finger protein 144A 9781 The protein encoded by this protein contains a RING finger, a motif known to be involved in protein-DNA and protein-protein interactions. The mouse counterpart of this protein has been shown to interact with Ube2l3/UbcM4, which is an ubiquitin-conjugating enzyme involved in embryonic development. RNF144A ENSG00000151692
integrin subunit alpha 8 8516 Integrins are heterodimeric transmembrane receptor proteins that mediate numerous cellular processes including cell adhesion, cytoskeletal rearrangement, and activation of cell signaling pathways. Integrins are composed of alpha and beta subunits. This gene encodes the alpha 8 subunit of the heterodimeric integrin alpha8beta1 protein. The encoded protein is a single-pass type 1 membrane protein that contains multiple FG-GAP repeats. This repeat is predicted to fold into a beta propeller structure. This gene regulates the recruitment of mesenchymal cells into epithelial structures, mediates cell-cell interactions, and regulates neurite outgrowth of sensory and motor neurons. The integrin alpha8beta1 protein thus plays an important role in wound-healing and organogenesis. Mutations in this gene have been associated with renal hypodysplasia/aplasia-1 (RHDA1) and with several animal models of chronic kidney disease. Alternate splicing results in multiple transcript variants encoding distinct isoforms. ITGA8 ENSG00000077943
very low density lipoprotein receptor 7436 The low density lipoprotein receptor (LDLR) gene family consists of cell surface proteins involved in receptor-mediated endocytosis of specific ligands. This gene encodes a lipoprotein receptor that is a member of the LDLR family and plays important roles in VLDL-triglyceride metabolism and the reelin signaling pathway. Mutations in this gene cause VLDLR-associated cerebellar hypoplasia. Alternative splicing generates multiple transcript variants encoding distinct isoforms for this gene. VLDLR ENSG00000147852
cholinergic receptor nicotinic epsilon subunit 1145 Acetylcholine receptors at mature mammalian neuromuscular junctions are pentameric protein complexes composed of four subunits in the ratio of two alpha subunits to one beta, one epsilon, and one delta subunit. The acetylcholine receptor changes subunit composition shortly after birth when the epsilon subunit replaces the gamma subunit seen in embryonic receptors. Mutations in the epsilon subunit are associated with congenital myasthenic syndrome. CHRNE ENSG00000108556
glycoprotein 2 2813 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. GP2 ENSG00000169347
brain enriched guanylate kinase associated 57596 NA BEGAIN ENSG00000183092
coiled-coil domain containing 150 pseudogene 1 ENSG00000256304 NA CCDC150P1 ENSG00000256304
colony stimulating factor 3 1440 The protein encoded by this gene is a cytokine that controls the production, differentiation, and function of granulocytes. The active protein is found extracellularly. Alternatively spliced transcript variants have been described for this gene. CSF3 ENSG00000108342
transmembrane protein 266 123591 NA TMEM266 ENSG00000169758
NA ENSG00000255201 NA RP11-350N15.4 ENSG00000255201
BCL2 related protein A1 597 This gene encodes a member of the BCL-2 protein family. The proteins of this family form hetero- or homodimers and act as anti- and pro-apoptotic regulators that are involved in a wide variety of cellular activities such as embryonic development, homeostasis and tumorigenesis. The protein encoded by this gene is able to reduce the release of pro-apoptotic cytochrome c from mitochondria and block caspase activation. This gene is a direct transcription target of NF-kappa B in response to inflammatory mediators, and is up-regulated by different extracellular signals, such as granulocyte-macrophage colony-stimulating factor (GM-CSF), CD40, phorbol ester and inflammatory cytokine TNF and IL-1, which suggests a cytoprotective function that is essential for lymphocyte activation as well as cell survival. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. BCL2A1 ENSG00000140379
NA ENSG00000253785 NA CTC-308K20.3 ENSG00000253785
protease, serine 1 5644 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. PRSS1 ENSG00000204983
oculomedin 10896 The protein encoded by this gene is induced by cyclic mechanical stretching in trabecular cells of the eye and it is also expressed in retina. This protein may play a role in trabecular meshwork function and the development of glaucoma. OCLM ENSG00000262180
serine peptidase inhibitor, Kazal type 1 6690 The protein encoded by this gene is a trypsin inhibitor, which is secreted from pancreatic acinar cells into pancreatic juice. It is thought to function in the prevention of trypsin-catalyzed premature activation of zymogens within the pancreas and the pancreatic duct. Mutations in this gene are associated with hereditary pancreatitis and tropical calcific pancreatitis. SPINK1 ENSG00000164266
alkaline ceramidase 2 340485 The sphingolipid metabolite sphingosine-1-phosphate promotes cell proliferation and survival, whereas its precursor, sphingosine, has the opposite effect. The ceramidase ACER2 hydrolyzes very long chain ceramides to generate sphingosine (Xu et al., 2006 [PubMed 16940153]). ACER2 ENSG00000177076
NA ENSG00000261575 NA RP11-259G18.1 ENSG00000261575
NA ENSG00000236364 NA RP11-525G13.2 ENSG00000236364
NA ENSG00000259326 NA RP11-102L12.2 ENSG00000259326
early growth response 3 1960 This gene encodes a transcriptional regulator that belongs to the EGR family of C2H2-type zinc-finger proteins. It is an immediate-early growth response gene which is induced by mitogenic stimulation. The protein encoded by this gene participates in the transcriptional regulation of genes in controling biological rhythm. It may also play a role in a wide variety of processes including muscle development, lymphocyte development, endothelial cell growth and migration, and neuronal development. Alternative splicing results in multiple transcript variants encoding distinct isoforms. EGR3 ENSG00000179388
NA ENSG00000258895 NA CTD-2643K12.1 ENSG00000258895
phospholamban 5350 The protein encoded by this gene is found as a pentamer and is a major substrate for the cAMP-dependent protein kinase in cardiac muscle. The encoded protein is an inhibitor of cardiac muscle sarcoplasmic reticulum Ca(2+)-ATPase in the unphosphorylated state, but inhibition is relieved upon phosphorylation of the protein. The subsequent activation of the Ca(2+) pump leads to enhanced muscle relaxation rates, thereby contributing to the inotropic response elicited in heart by beta-agonists. The encoded protein is a key regulator of cardiac diastolic function. Mutations in this gene are a cause of inherited human dilated cardiomyopathy with refractory congestive heart failure, and also familial hypertrophic cardiomyopathy. PLN ENSG00000198523
ankyrin repeat and SOCS box containing 2 51676 This gene encodes a member of the ankyrin repeat and SOCS box-containing (ASB) protein family. These proteins play a role in protein degradation by coupling suppressor of cytokine signalling (SOCS) proteins with the elongin BC complex. The encoded protein is a subunit of a multimeric E3 ubiquitin ligase complex that mediates the degradation of actin-binding proteins. This gene plays a role in retinoic acid-induced growth inhibition and differentiation of myeloid leukemia cells. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ASB2 ENSG00000100628
uncharacterized LOC101928399 101928399 NA TCONS_00029157 ENSG00000237989
polymerase (DNA) beta 5423 The protein encoded by this gene is a DNA polymerase involved in base excision and repair, also called gap-filling DNA synthesis. The encoded protein, acting as a monomer, is normally found in the cytoplasm, but it translocates to the nucleus upon DNA damage. Several transcript variants of this gene exist, but the full-length nature of only one has been described to date. POLB ENSG00000070501
small nucleolar RNA, H/ACA box 64 26784 NA SNORA64 ENSG00000207405
angiomotin like 1 154810 The protein encoded by this gene is a peripheral membrane protein that is a component of tight junctions or TJs. TJs form an apical junctional structure and act to control paracellular permeability and maintain cell polarity. This protein is related to angiomotin, an angiostatin binding protein that regulates endothelial cell migration and capillary formation. Two transcript variants encoding different isoforms have been found for this gene. AMOTL1 ENSG00000166025
zinc finger protein 385C 201181 NA ZNF385C ENSG00000187595
NA ENSG00000255513 NA AC005363.9 ENSG00000255513
chromodomain helicase DNA binding protein 7 55636 This gene encodes a protein that contains several helicase family domains. Mutations in this gene have been found in some patients with the CHARGE syndrome. Two transcript variants encoding different isoforms have been found for this gene. CHD7 ENSG00000171316
NA ENSG00000229299 NA RP4-583P15.10 ENSG00000229299
BCL2/adenovirus E1B 19kDa interacting protein 3 664 This gene is encodes a mitochondrial protein that contains a BH3 domain and acts as a pro-apoptotic factor. The encoded protein interacts with anti-apoptotic proteins, including the E1B 19 kDa protein and Bcl2. This gene is silenced in tumors by DNA methylation. BNIP3 ENSG00000176171
phospholipase A2 group IB 5319 This gene encodes a secreted member of the phospholipase A2 (PLA2) class of enzymes, which is produced by the pancreatic acinar cells. The encoded calcium-dependent enzyme catalyzes the hydrolysis of the sn-2 position of membrane glycerophospholipids to release arachidonic acid (AA) and lysophospholipids. AA is subsequently converted by downstream metabolic enzymes to several bioactive lipophilic compounds (eicosanoids), including prostaglandins (PGs) and leukotrienes (LTs). The enzyme may be involved in several physiological processes including cell contraction, cell proliferation and pathological response. PLA2G1B ENSG00000170890
RNA, 7SK small nuclear pseudogene 70 ENSG00000252464 NA RN7SKP70 ENSG00000252464
hypoxia inducible lipid droplet associated 29923 NA HILPDA ENSG00000135245
NA ENSG00000259407 NA RP11-158M2.3 ENSG00000259407
apolipoprotein L4 80832 The protein encoded by this gene is a member of the apolipoprotein L family and may play a role in lipid exchange and transport throughout the body, as well as in reverse cholesterol transport from peripheral cells to the liver. Two transcript variants encoding two different isoforms have been found for this gene. Only one of the isoforms appears to be a secreted protein. APOL4 ENSG00000100336
mesoderm specific transcript 4232 This gene encodes a member of the alpha/beta hydrolase superfamily. It is imprinted, exhibiting preferential expression from the paternal allele in fetal tissues, and isoform-specific imprinting in lymphocytes. The loss of imprinting of this gene has been linked to certain types of cancer and may be due to promotor switching. The encoded protein may play a role in development. Alternatively spliced transcript variants encoding multiple isoforms have been identified for this gene. Pseudogenes of this gene are located on the short arm of chromosomes 3 and 4, and the long arm of chromosomes 6 and 15. MEST ENSG00000106484
NA ENSG00000270890 NA RP3-468K18.6 ENSG00000270890
NA ENSG00000256469 NA RP11-856F16.2 ENSG00000256469
ATPase Na+/K+ transporting subunit alpha 2 477 The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The catalytic subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes an alpha 2 subunit. Mutations in this gene result in familial basilar or hemiplegic migraines, and in a rare syndrome known as alternating hemiplegia of childhood. ATP1A2 ENSG00000018625
glutamate-ammonia ligase 2752 The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. GLUL ENSG00000135821
enhancer of zeste 2 polycomb repressive complex 2 subunit 2146 This gene encodes a member of the Polycomb-group (PcG) family. PcG family members form multimeric protein complexes, which are involved in maintaining the transcriptional repressive state of genes over successive cell generations. This protein associates with the embryonic ectoderm development protein, the VAV1 oncoprotein, and the X-linked nuclear protein. This protein may play a role in the hematopoietic and central nervous systems. Multiple alternatively splcied transcript variants encoding distinct isoforms have been identified for this gene. EZH2 ENSG00000106462
transthyretin 7276 This gene encodes transthyretin, one of the three prealbumins including alpha-1-antitrypsin, transthyretin and orosomucoid. Transthyretin is a carrier protein; it transports thyroid hormones in the plasma and cerebrospinal fluid, and also transports retinol (vitamin A) in the plasma. The protein consists of a tetramer of identical subunits. More than 80 different mutations in this gene have been reported; most mutations are related to amyloid deposition, affecting predominantly peripheral nerve and/or the heart, and a small portion of the gene mutations is non-amyloidogenic. The diseases caused by mutations include amyloidotic polyneuropathy, euthyroid hyperthyroxinaemia, amyloidotic vitreous opacities, cardiomyopathy, oculoleptomeningeal amyloidosis, meningocerebrovascular amyloidosis, carpal tunnel syndrome, etc. TTR ENSG00000118271
regulator of G-protein signaling 16 6004 The protein encoded by this gene belongs to the ‘regulator of G protein signaling’ family. It inhibits signal transduction by increasing the GTPase activity of G protein alpha subunits. It also may play a role in regulating the kinetics of signaling in the phototransduction cascade. RGS16 ENSG00000143333
small nucleolar RNA host gene 15 ENSG00000232956 NA SNHG15 ENSG00000232956
laminin subunit alpha 3 3909 The protein encoded by this gene belongs to the laminin family of secreted molecules. Laminins are heterotrimeric molecules that consist of alpha, beta, and gamma subunits that assemble through a coiled-coil domain. Laminins are essential for formation and function of the basement membrane and have additional functions in regulating cell migration and mechanical signal transduction. This gene encodes an alpha subunit and is responsive to several epithelial-mesenchymal regulators including keratinocyte growth factor, epidermal growth factor and insulin-like growth factor. Mutations in this gene have been identified as the cause of Herlitz type junctional epidermolysis bullosa and laryngoonychocutaneous syndrome. Alternative splicing and alternative promoter usage result in multiple transcript variants. LAMA3 ENSG00000053747
zinc finger protein 878 729747 NA ZNF878 ENSG00000257446
general transcription factor IIi pseudogene 14 ENSG00000226002 NA GTF2IP14 ENSG00000226002
NA ENSG00000219470 NA RP3-337H4.6 ENSG00000219470
NA ENSG00000258168 NA RP11-588H23.3 ENSG00000258168
cortexin 1 404217 NA CTXN1 ENSG00000178531
taste 2 receptor member 19 259294 NA TAS2R19 ENSG00000212124
transmembrane protein 120B 144404 NA TMEM120B ENSG00000188735
guanine nucleotide binding protein-like 3 (nucleolar)-like pseudogene 1 ENSG00000215032 NA GNL3LP1 ENSG00000215032
uncharacterized LOC101928371 101928371 NA LOC101928371 ENSG00000225420
family with sequence similarity 153 member C ENSG00000204677 NA FAM153C ENSG00000204677
carnitine palmitoyltransferase 1B 1375 The protein encoded by this gene, a member of the carnitine/choline acetyltransferase family, is the rate-controlling enzyme of the long-chain fatty acid beta-oxidation pathway in muscle mitochondria. This enzyme is required for the net transport of long-chain fatty acyl-CoAs from the cytoplasm into the mitochondria. Multiple transcript variants encoding different isoforms have been found for this gene, and read-through transcripts are expressed from the upstream locus that include exons from this gene. CPT1B ENSG00000205560
transmembrane protein 97 27346 TMEM97 is a conserved integral membrane protein that plays a role in controlling cellular cholesterol levels (Bartz et al., 2009 [PubMed 19583955]). TMEM97 ENSG00000109084
actin binding LIM protein 1 3983 This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. ABLIM1 ENSG00000099204
zinc finger protein 841 284371 NA ZNF841 ENSG00000197608
pyrophosphatase (inorganic) 1 5464 The protein encoded by this gene is a member of the inorganic pyrophosphatase (PPase) family. PPases catalyze the hydrolysis of pyrophosphate to inorganic phosphate, which is important for the phosphate metabolism of cells. Studies of a similar protein in bovine suggested a cytoplasmic localization of this enzyme. PPA1 ENSG00000180817
nucleoredoxin 64359 This gene encodes a member of the thioredoxin superfamily, a group of small, multifunctional redox-active proteins. Members of this family are characterized by a conserved active motif called the thioredoxin fold that catalyzes disulfide bond formation and isomerization. The encoded protein acts a redox-dependent regulator of the Wnt signaling pathway and is involved in cell growth and differentiation. NXN ENSG00000167693
metallothionein 2A 4502 NA MT2A ENSG00000125148
tankyrase 1 binding protein 1 85456 NA TNKS1BP1 ENSG00000149115
TYRO3 protein tyrosine kinase 7301 The gene is part of a 3-member transmembrane receptor kinase receptor family with a processed pseudogene distal on chromosome 15. The encoded protein is activated by the products of the growth arrest-specific gene 6 and protein S genes and is involved in controlling cell survival and proliferation, spermatogenesis, immunoregulation and phagocytosis. The encoded protein has also been identified as a cell entry factor for Ebola and Marburg viruses. TYRO3 ENSG00000092445
DEAD/H-box helicase 11 1663 DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a DEAD box protein, which is an enzyme that possesses both ATPase and DNA helicase activities. This gene is a homolog of the yeast CHL1 gene, and may function to maintain chromosome transmission fidelity and genome stability. Alternative splicing results in multiple transcript variants encoding distinct isoforms. DDX11 ENSG00000013573
chymotrypsinogen B1 1504 The protein encoded by this gene is one of a family of serine proteases that is secreted into the gastrointestinal tract as an inactive precursor, which is activated by proteolytic cleavage with trypsin. CTRB1 ENSG00000168925
coiled-coil domain containing 144B (pseudogene) 284047 NA CCDC144B ENSG00000154874
ATP binding cassette subfamily C member 4 10257 The protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the MRP subfamily which is involved in multi-drug resistance. This family member plays a role in cellular detoxification as a pump for its substrate, organic anions. It may also function in prostaglandin-mediated cAMP signaling in ciliogenesis. Alternative splicing of this gene results in multiple transcript variants. ABCC4 ENSG00000125257
regenerating family member 1 alpha 5967 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. REG1A ENSG00000115386
NA ENSG00000246250 NA RP11-613D13.5 ENSG00000246250
NA ENSG00000231628 NA RP3-355L5.4 ENSG00000231628
zinc finger and BTB domain containing 46 140685 NA ZBTB46 ENSG00000130584
islet cell autoantigen 1 3382 This gene encodes a protein with an arfaptin homology domain that is found both in the cytosol and as membrane-bound form on the Golgi complex and immature secretory granules. This protein is believed to be an autoantigen in insulin-dependent diabetes mellitus and primary Sjogren’s syndrome. Several transcript variants encoding two different isoforms have been found for this gene. ICA1 ENSG00000003147
ZDHHC20 intronic transcript 1 ENSG00000236953 NA ZDHHC20-IT1 ENSG00000236953
thyroid hormone receptor interactor 10 9322 NA TRIP10 ENSG00000125733
NA ENSG00000111788 NA RP11-22B23.1 ENSG00000111788
pentraxin 3 5806 NA PTX3 ENSG00000163661
pleckstrin homology like domain family A member 3 23612 NA PHLDA3 ENSG00000174307
nucleophosmin 1 (nucleolar phosphoprotein B23, numatrin) pseudogene 37 ENSG00000219085 NA NPM1P37 ENSG00000219085
heterogeneous nuclear ribonucleoprotein A3 pseudogene 9 ENSG00000270903 NA HNRNPA3P9 ENSG00000270903
GTF2I repeat domain containing 1 9569 The protein encoded by this gene contains five GTF2I-like repeats and each repeat possesses a potential helix-loop-helix (HLH) motif. It may have the ability to interact with other HLH-proteins and function as a transcription factor or as a positive transcriptional regulator under the control of Retinoblastoma protein. This gene plays a role in craniofacial and cognitive development and mutations have been associated with Williams-Beuren syndrome, a multisystem developmental disorder caused by deletion of multiple genes at 7q11.23. Alternative splicing results in multiple transcript variants. GTF2IRD1 ENSG00000006704
family with sequence similarity 43 member A 131583 NA FAM43A ENSG00000185112
tropomyosin 3 pseudogene 6 ENSG00000250731 NA TPM3P6 ENSG00000250731
Rho related BTB domain containing 1 9886 The protein encoded by this gene belongs to the Rho family of the small GTPase superfamily. It contains a GTPase domain, a proline-rich region, a tandem of 2 BTB (broad complex, tramtrack, and bric-a-brac) domains, and a conserved C-terminal region. The protein plays a role in small GTPase-mediated signal transduction and the organization of the actin filament system. Alternate splicing results in multiple transcript variants. RHOBTB1 ENSG00000072422
NA ENSG00000270075 NA RP11-127L20.5 ENSG00000270075
potassium voltage-gated channel subfamily J member 12 3768 This gene encodes an inwardly rectifying K+ channel which may be blocked by divalent cations. This protein is thought to be one of multiple inwardly rectifying channels which contribute to the cardiac inward rectifier current (IK1). The gene is located within the Smith-Magenis syndrome region on chromosome 17. KCNJ12 ENSG00000184185
erythrocyte membrane protein band 4.1 2035 The protein encoded by this gene, together with spectrin and actin, constitute the red cell membrane cytoskeletal network. This complex plays a critical role in erythrocyte shape and deformability. Mutations in this gene are associated with type 1 elliptocytosis (EL1). Alternatively spliced transcript variants encoding different isoforms have been described for this gene. EPB41 ENSG00000159023
ATP binding cassette subfamily G member 1 9619 The protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the White subfamily. It is involved in macrophage cholesterol and phospholipids transport, and may regulate cellular lipid homeostasis in other cell types. Six alternative splice variants have been identified. ABCG1 ENSG00000160179
uncoupling protein 3 7352 Mitochondrial uncoupling proteins (UCP) are members of the larger family of mitochondrial anion carrier proteins (MACP). UCPs separate oxidative phosphorylation from ATP synthesis with energy dissipated as heat, also referred to as the mitochondrial proton leak. UCPs facilitate the transfer of anions from the inner to the outer mitochondrial membrane and the return transfer of protons from the outer to the inner mitochondrial membrane. They also reduce the mitochondrial membrane potential in mammalian cells. The different UCPs have tissue-specific expression; this gene is primarily expressed in skeletal muscle. This gene’s protein product is postulated to protect mitochondria against lipid-induced oxidative stress. Expression levels of this gene increase when fatty acid supplies to mitochondria exceed their oxidation capacity and the protein enables the export of fatty acids from mitochondria. UCPs contain the three solcar protein domains typically found in MACPs. Two splice variants have been found for this gene. UCP3 ENSG00000175564
pancreatic lipase 5406 This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. PNLIP ENSG00000175535
NA ENSG00000243829 NA CTB-33G10.1 ENSG00000243829
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",3,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 4 Annotations

out <- mygene::queryMany(gene_list[4,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol query summary name X_id notfound
IL6 ENSG00000136244 This gene encodes a cytokine that functions in inflammation and the maturation of B cells. In addition, the encoded protein has been shown to be an endogenous pyrogen capable of inducing fever in people with autoimmune diseases or infections. The protein is primarily produced at sites of acute and chronic inflammation, where it is secreted into the serum and induces a transcriptional inflammatory response through interleukin 6 receptor, alpha. The functioning of this gene is implicated in a wide variety of inflammation-associated disease states, including suspectibility to diabetes mellitus and systemic juvenile rheumatoid arthritis. Alternative splicing results in multiple transcript variants. interleukin 6 3569 NA
CSF3 ENSG00000108342 The protein encoded by this gene is a cytokine that controls the production, differentiation, and function of granulocytes. The active protein is found extracellularly. Alternatively spliced transcript variants have been described for this gene. colony stimulating factor 3 1440 NA
CXCL8 ENSG00000169429 The protein encoded by this gene is a member of the CXC chemokine family. This chemokine is one of the major mediators of the inflammatory response. This chemokine is secreted by several cell types. It functions as a chemoattractant, and is also a potent angiogenic factor. This gene is believed to play a role in the pathogenesis of bronchiolitis, a common respiratory tract disease caused by viral infection. This gene and other ten members of the CXC chemokine gene family form a chemokine gene cluster in a region mapped to chromosome 4q. C-X-C motif chemokine ligand 8 3576 NA
SPRR2E ENSG00000203785 This gene encodes a member of a family of small proline-rich proteins clustered in the epidermal differentiation complex on chromosome 1q21. The encoded protein, along with other family members, is a component of the cornified cell envelope that forms beneath the plasma membrane in terminally differentiated stratified squamous epithelia. This envelope serves as a barrier against extracellular and environmental factors. The seven SPRR2 genes (A-G) appear to have been homogenized by gene conversion compared to others in the cluster that exhibit greater differences in protein structure. small proline rich protein 2E 6704 NA
PPP1R1A ENSG00000135447 NA protein phosphatase 1 regulatory inhibitor subunit 1A 5502 NA
SLURP1 ENSG00000126233 The protein encoded by this gene is a member of the Ly6/uPAR family but lacks a GPI-anchoring signal sequence. It is thought that this secreted protein contains antitumor activity. Mutations in this gene have been associated with Mal de Meleda, a rare autosomal recessive skin disorder. This gene maps to the same chromosomal region as several members of the Ly6/uPAR family of glycoprotein receptors. secreted LY6/PLAUR domain containing 1 57152 NA
ATP1B1 ENSG00000143153 The protein encoded by this gene belongs to the family of Na+/K+ and H+/K+ ATPases beta chain proteins, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The beta subunit regulates, through assembly of alpha/beta heterodimers, the number of sodium pumps transported to the plasma membrane. The glycoprotein subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes a beta 1 subunit. Alternatively spliced transcript variants encoding different isoforms have been described, but their biological validity is not known. ATPase Na+/K+ transporting subunit beta 1 481 NA
SLC1A1 ENSG00000106688 This gene encodes a member of the high-affinity glutamate transporters that play an essential role in transporting glutamate across plasma membranes. In brain, these transporters are crucial in terminating the postsynaptic action of the neurotransmitter glutamate, and in maintaining extracellular glutamate concentrations below neurotoxic levels. This transporter also transports aspartate, and mutations in this gene are thought to cause dicarboxylicamino aciduria, also known as glutamate-aspartate transport defect. solute carrier family 1 member 1 6505 NA
NA ENSG00000179294 NA NA NA TRUE
GPRC5B ENSG00000167191 This gene encodes a member of the type 3 G protein-coupled receptor family. Members of this superfamily are characterized by a signature 7-transmembrane domain motif. The encoded protein may modulate insulin secretion and increased protein expression is associated with type 2 diabetes. Alternative splicing results in multiple transcript variants. G protein-coupled receptor class C group 5 member B 51704 NA
NTRK1 ENSG00000198400 This gene encodes a member of the neurotrophic tyrosine kinase receptor (NTKR) family. This kinase is a membrane-bound receptor that, upon neurotrophin binding, phosphorylates itself and members of the MAPK pathway. The presence of this kinase leads to cell differentiation and may play a role in specifying sensory neuron subtypes. Mutations in this gene have been associated with congenital insensitivity to pain, anhidrosis, self-mutilating behavior, mental retardation and cancer. Alternate transcriptional splice variants of this gene have been found, but only three have been characterized to date. neurotrophic receptor tyrosine kinase 1 4914 NA
SOCS3 ENSG00000184557 This gene encodes a member of the STAT-induced STAT inhibitor (SSI), also known as suppressor of cytokine signaling (SOCS), family. SSI family members are cytokine-inducible negative regulators of cytokine signaling. The expression of this gene is induced by various cytokines, including IL6, IL10, and interferon (IFN)-gamma. The protein encoded by this gene can bind to JAK2 kinase, and inhibit the activity of JAK2 kinase. Studies of the mouse counterpart of this gene suggested the roles of this gene in the negative regulation of fetal liver hematopoiesis, and placental development. suppressor of cytokine signaling 3 9021 NA
RP11-845C23.3 ENSG00000267396 NA NA ENSG00000267396 NA
SPX ENSG00000134548 The protein encoded by this gene is a hormone involved in modulation of cardiovascular and renal function. It has also been shown in rats to cause weight loss. Several transcript variants have been found for this gene. spexin hormone 80763 NA
SPAG4 ENSG00000061656 The mammalian sperm flagellum contains two cytoskeletal structures associated with the axoneme: the outer dense fibers surrounding the axoneme in the midpiece and principal piece and the fibrous sheath surrounding the outer dense fibers in the principal piece of the tail. Defects in these structures are associated with abnormal tail morphology, reduced sperm motility, and infertility. In the rat, the protein encoded by this gene associates with an outer dense fiber protein via a leucine zipper motif and localizes to the microtubules of the manchette and axoneme during sperm tail development. Alternative splicing results in multiple transcript variants encoding different isoforms. sperm associated antigen 4 6676 NA
KRT14 ENSG00000186847 This gene encodes a member of the keratin family, the most diverse group of intermediate filaments. This gene product, a type I keratin, is usually found as a heterotetramer with two keratin 5 molecules, a type II keratin. Together they form the cytoskeleton of epithelial cells. Mutations in the genes for these keratins are associated with epidermolysis bullosa simplex. At least one pseudogene has been identified at 17p12-p11. keratin 14 3861 NA
RP11-334E6.12 ENSG00000263873 NA NA ENSG00000263873 NA
RBP1 ENSG00000114115 This gene encodes the carrier protein involved in the transport of retinol (vitamin A alcohol) from the liver storage site to peripheral tissue. Vitamin A is a fat-soluble vitamin necessary for growth, reproduction, differentiation of epithelial tissues, and vision. Multiple transcript variants encoding different isoforms have been found for this gene. retinol binding protein 1 5947 NA
SRSF12 ENSG00000154548 NA serine and arginine rich splicing factor 12 135295 NA
SPRR1A ENSG00000169474 NA small proline rich protein 1A 6698 NA
THY1 ENSG00000154096 This gene encodes a cell surface glycoprotein and member of the immunoglobulin superfamily of proteins. The encoded protein is involved in cell adhesion and cell communication in numerous cell types, but particularly in cells of the immune and nervous systems. The encoded protein is widely used as a marker for hematopoietic stem cells. This gene may function as a tumor suppressor in nasopharyngeal carcinoma. Alternative splicing results in multiple transcript variants. Thy-1 cell surface antigen 7070 NA
LIF ENSG00000128342 The protein encoded by this gene is a pleiotropic cytokine with roles in several different systems. It is involved in the induction of hematopoietic differentiation in normal and myeloid leukemia cells, induction of neuronal cell differentiation, regulator of mesenchymal to epithelial conversion during kidney development, and may also have a role in immune tolerance at the maternal-fetal interface. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. leukemia inhibitory factor 3976 NA
RAMP1 ENSG00000132329 The protein encoded by this gene is a member of the RAMP family of single-transmembrane-domain proteins, called receptor (calcitonin) activity modifying proteins (RAMPs). RAMPs are type I transmembrane proteins with an extracellular N terminus and a cytoplasmic C terminus. RAMPs are required to transport calcitonin-receptor-like receptor (CRLR) to the plasma membrane. CRLR, a receptor with seven transmembrane domains, can function as either a calcitonin-gene-related peptide (CGRP) receptor or an adrenomedullin receptor, depending on which members of the RAMP family are expressed. In the presence of this (RAMP1) protein, CRLR functions as a CGRP receptor. The RAMP1 protein is involved in the terminal glycosylation, maturation, and presentation of the CGRP receptor to the cell surface. Alternative splicing results in multiple transcript variants encoding different isoforms. receptor activity modifying protein 1 10267 NA
LRRC2 ENSG00000163827 This gene encodes a member of the leucine-rich repeat-containing family of proteins, which function in diverse biological pathways. This family member may possibly be a nuclear protein. Similarity to the RAS suppressor protein, as well as expression down-regulation observed in tumor cells, suggests that it may function as a tumor suppressor. The gene is located in the chromosome 3 common eliminated region 1 (C3CER1), a 1.4 Mb region that is commonly deleted in diverse tumors. A related pseudogene has been identified on chromosome 2. leucine rich repeat containing 2 79442 NA
LGALS7B ENSG00000178934 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. Differential and in situ hybridization studies indicate that this lectin is specifically expressed in keratinocytes and found mainly in stratified squamous epithelium. A duplicate copy of this gene (GeneID:3963) is found adjacent to, but on the opposite strand on chromosome 19. galectin 7B 653499 NA
EPB41L4B ENSG00000095203 NA erythrocyte membrane protein band 4.1 like 4B 54566 NA
MESP1 ENSG00000166823 NA mesoderm posterior bHLH transcription factor 1 55897 NA
BAALC ENSG00000164929 This gene was identified by gene expression studies in patients with acute myeloid leukemia (AML). The gene is conserved among mammals and is not found in lower organisms. Tissues that express this gene develop from the neuroectoderm. Multiple alternatively spliced transcript variants that encode different proteins have been described for this gene; however, some of the transcript variants are found only in AML cell lines. brain and acute leukemia, cytoplasmic 79870 NA
SPRR1B ENSG00000169469 The protein encoded by this gene is an envelope protein of keratinocytes. The encoded protein is crosslinked to membrane proteins by transglutaminase, forming an insoluble layer under the plasma membrane. This protein is proline-rich and contains several tandem amino acid repeats. small proline rich protein 1B 6699 NA
RP3-414A15.10 ENSG00000258603 NA NA ENSG00000258603 NA
KIAA1161 ENSG00000164976 NA KIAA1161 57462 NA
IL32 ENSG00000008517 This gene encodes a member of the cytokine family. The protein contains a tyrosine sulfation site, 3 potential N-myristoylation sites, multiple putative phosphorylation sites, and an RGD cell-attachment sequence. Expression of this protein is increased after the activation of T-cells by mitogens or the activation of NK cells by IL-2. This protein induces the production of TNFalpha from macrophage cells. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. interleukin 32 9235 NA
NKX3-1 ENSG00000167034 This gene encodes a homeobox-containing transcription factor. This transcription factor functions as a negative regulator of epithelial cell growth in prostate tissue. Aberrant expression of this gene is associated with prostate tumor progression. Alternate splicing results in multiple transcript variants of this gene. NK3 homeobox 1 4824 NA
ANGPTL4 ENSG00000167772 This gene encodes a glycosylated, secreted protein containing a C-terminal fibrinogen domain. The encoded protein is induced by peroxisome proliferation activators and functions as a serum hormone that regulates glucose homeostasis, lipid metabolism, and insulin sensitivity. This protein can also act as an apoptosis survival factor for vascular endothelial cells and can prevent metastasis by inhibiting vascular growth and tumor cell invasion. The C-terminal domain may be proteolytically-cleaved from the full-length secreted protein. Decreased expression of this gene has been associated with type 2 diabetes. Alternative splicing results in multiple transcript variants. This gene was previously referred to as ANGPTL2 but has been renamed ANGPTL4. angiopoietin like 4 51129 NA
S100A7 ENSG00000143556 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein differs from the other S100 proteins of known structure in its lack of calcium binding ability in one EF-hand at the N-terminus. The protein is overexpressed in hyperproliferative skin diseases, exhibits antimicrobial activities against bacteria and induces immunomodulatory activities. S100 calcium binding protein A7 6278 NA
CXCL3 ENSG00000163734 This antimicrobial gene encodes a member of the CXC subfamily of chemokines. The encoded protein is a secreted growth factor that signals through the G-protein coupled receptor, CXC receptor 2. This protein plays a role in inflammation and as a chemoattractant for neutrophils. C-X-C motif chemokine ligand 3 2921 NA
RAB26 ENSG00000167964 Members of the RAB protein family, including RAB26, are important regulators of vesicular fusion and trafficking. The RAB family of small G proteins regulates intercellular vesicle trafficking, including exocytosis, endocytosis, and recycling (summary by Seki et al., 2000 [PubMed 11043516]). RAB26, member RAS oncogene family 25837 NA
DDAH1 ENSG00000153904 This gene belongs to the dimethylarginine dimethylaminohydrolase (DDAH) gene family. The encoded enzyme plays a role in nitric oxide generation by regulating cellular concentrations of methylarginines, which in turn inhibit nitric oxide synthase activity. dimethylarginine dimethylaminohydrolase 1 23576 NA
CXCL2 ENSG00000081041 This antimicrobial gene is part of a chemokine superfamily that encodes secreted proteins involved in immunoregulatory and inflammatory processes. The superfamily is divided into four subfamilies based on the arrangement of the N-terminal cysteine residues of the mature peptide. This chemokine, a member of the CXC subfamily, is expressed at sites of inflammation and may suppress hematopoietic progenitor cell proliferation. C-X-C motif chemokine ligand 2 2920 NA
IL20RB ENSG00000174564 IL20RB and IL20RA (MIM 605620) form a heterodimeric receptor for interleukin-20 (IL20; MIM 605619) (Blumberg et al., 2001 [PubMed 11163236]). interleukin 20 receptor subunit beta 53833 NA
EGR1 ENSG00000120738 The protein encoded by this gene belongs to the EGR family of C2H2-type zinc-finger proteins. It is a nuclear protein and functions as a transcriptional regulator. The products of target genes it activates are required for differentitation and mitogenesis. Studies suggest this is a cancer suppressor gene. early growth response 1 1958 NA
RP11-396F22.1 ENSG00000257718 NA NA ENSG00000257718 NA
CRCT1 ENSG00000169509 NA cysteine rich C-terminal 1 54544 NA
SLC4A4 ENSG00000080493 This gene encodes a sodium bicarbonate cotransporter (NBC) involved in the regulation of bicarbonate secretion and absorption and intracellular pH. Mutations in this gene are associated with proximal renal tubular acidosis. Multiple transcript variants encoding different isoforms have been found for this gene. solute carrier family 4 member 4 8671 NA
CALML5 ENSG00000178372 This gene encodes a novel calcium binding protein expressed in the epidermis and related to the calmodulin family of calcium binding proteins. Functional studies with recombinant protein demonstrate it does bind calcium and undergoes a conformational change when it does so. Abundant expression is detected only in reconstructed epidermis and is restricted to differentiating keratinocytes. In addition, it can associate with transglutaminase 3, shown to be a key enzyme in the terminal differentiation of keratinocytes. calmodulin like 5 51806 NA
SLC25A18 ENSG00000182902 NA solute carrier family 25 member 18 83733 NA
PNLIP ENSG00000175535 This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. pancreatic lipase 5406 NA
SHANK3 ENSG00000251322 NA SH3 and multiple ankyrin repeat domains 3 ENSG00000251322 NA
STMN2 ENSG00000104435 This gene encodes a member of the stathmin family of phosphoproteins. Stathmin proteins function in microtubule dynamics and signal transduction. The encoded protein plays a regulatory role in neuronal growth and is also thought to be involved in osteogenesis. Reductions in the expression of this gene have been associated with Down’s syndrome and Alzheimer’s disease. Alternatively spliced transcript variants have been observed for this gene. A pseudogene of this gene is located on the long arm of chromosome 6. stathmin 2 11075 NA
NNAT ENSG00000053438 The protein encoded by this gene is a proteolipid that may be involved in the regulation of ion channels during brain development. The encoded protein may also play a role in forming and maintaining the structure of the nervous system. This gene is found within an intron of another gene, bladder cancer associated protein, but on the opposite strand. This gene is imprinted and is expressed only from the paternal allele. neuronatin 4826 NA
SLC29A2 ENSG00000174669 The uptake of nucleosides by transporters, such as SLC29A2, is essential for nucleotide synthesis by salvage pathways in cells that lack de novo biosynthetic pathways. Nucleoside transport also plays a key role in the regulation of many physiologic processes through its effect on adenosine concentration at the cell surface (Griffiths et al., 1997 [PubMed 9396714]). solute carrier family 29 member 2 3177 NA
KCNJ12 ENSG00000184185 This gene encodes an inwardly rectifying K+ channel which may be blocked by divalent cations. This protein is thought to be one of multiple inwardly rectifying channels which contribute to the cardiac inward rectifier current (IK1). The gene is located within the Smith-Magenis syndrome region on chromosome 17. potassium voltage-gated channel subfamily J member 12 3768 NA
SLCO4A1 ENSG00000101187 NA solute carrier organic anion transporter family member 4A1 28231 NA
OLAH ENSG00000152463 NA oleoyl-ACP hydrolase 55301 NA
UCHL1 ENSG00000154277 The protein encoded by this gene belongs to the peptidase C12 family. This enzyme is a thiol protease that hydrolyzes a peptide bond at the C-terminal glycine of ubiquitin. This gene is specifically expressed in the neurons and in cells of the diffuse neuroendocrine system. Mutations in this gene may be associated with Parkinson disease. ubiquitin C-terminal hydrolase L1 7345 NA
IL12A ENSG00000168811 This gene encodes a subunit of a cytokine that acts on T and natural killer cells, and has a broad array of biological activities. The cytokine is a disulfide-linked heterodimer composed of the 35-kD subunit encoded by this gene, and a 40-kD subunit that is a member of the cytokine receptor family. This cytokine is required for the T-cell-independent induction of interferon (IFN)-gamma, and is important for the differentiation of both Th1 and Th2 cells. The responses of lymphocytes to this cytokine are mediated by the activator of transcription protein STAT4. Nitric oxide synthase 2A (NOS2A/NOS2) is found to be required for the signaling process of this cytokine in innate immunity. interleukin 12A 3592 NA
SLC29A4 ENSG00000164638 This gene encodes a member of the SLC29A/ENT transporter protein family. The encoded membrane protein catalyzes the reuptake of monoamines into presynaptic neurons, thus determining the intensity and duration of monoamine neural signaling. It has been shown to transport several compounds, including serotonin, dopamine, and the neurotoxin 1-methyl-4-phenylpyridinium. Alternative splicing results in multiple transcript variants. solute carrier family 29 member 4 222962 NA
ARC ENSG00000198576 NA activity-regulated cytoskeleton-associated protein 23237 NA
PFN2 ENSG00000070087 The protein encoded by this gene is a ubiquitous actin monomer-binding protein belonging to the profilin family. It is thought to regulate actin polymerization in response to extracellular signals. There are two alternatively spliced transcript variants encoding different isoforms described for this gene. profilin 2 5217 NA
APOC1 ENSG00000130208 This gene encodes a member of the apolipoprotein C1 family. This gene is expressed primarily in the liver, and it is activated when monocytes differentiate into macrophages. The encoded protein plays a central role in high density lipoprotein (HDL) and very low density lipoprotein (VLDL) metabolism. This protein has also been shown to inhibit cholesteryl ester transfer protein in plasma. A pseudogene of this gene is located 4 kb downstream in the same orientation, on the same chromosome. This gene is mapped to chromosome 19, where it resides within a apolipoprotein gene cluster. apolipoprotein C1 341 NA
KRT1 ENSG00000167768 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 1 3848 NA
PITPNC1 ENSG00000154217 This gene encodes a member of the phosphatidylinositol transfer protein family. The encoded cytoplasmic protein plays a role in multiple processes including cell signaling and lipid metabolism by facilitating the transfer of phosphatidylinositol between membrane compartments. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene, and a pseudogene of this gene is located on the long arm of chromosome 1. phosphatidylinositol transfer protein, cytoplasmic 1 26207 NA
BATF3 ENSG00000123685 This gene encodes a member of the basic leucine zipper protein family. The encoded protein functions as a transcriptional repressor when heterodimerizing with JUN. The protein may play a role in repression of interleukin-2 and matrix metalloproteinase-1 transcription. basic leucine zipper ATF-like transcription factor 3 55509 NA
REG3A ENSG00000172016 This gene encodes a pancreatic secretory protein that may be involved in cell proliferation or differentiation. It has similarity to the C-type lectin superfamily. The enhanced expression of this gene is observed during pancreatic inflammation and liver carcinogenesis. The mature protein also functions as an antimicrobial protein with antibacterial activity. Alternate splicing results in multiple transcript variants that encode the same protein. regenerating family member 3 alpha 5068 NA
ITPKA ENSG00000137825 Regulates inositol phosphate metabolism by phosphorylation of second messenger inositol 1,4,5-trisphosphate to Ins(1,3,4,5)P4. The activity of the inositol 1,4,5-trisphosphate 3-kinase is responsible for regulating the levels of a large number of inositol polyphosphates that are important in cellular signaling. Both calcium/calmodulin and protein phosphorylation mechanisms control its activity. It is also a substrate for the cyclic AMP-dependent protein kinase, calcium/calmodulin- dependent protein kinase II, and protein kinase C in vitro. inositol-trisphosphate 3-kinase A 3706 NA
LY6D ENSG00000167656 NA lymphocyte antigen 6 complex, locus D 8581 NA
CXCL1 ENSG00000163739 This antimicrobial gene encodes a member of the CXC subfamily of chemokines. The encoded protein is a secreted growth factor that signals through the G-protein coupled receptor, CXC receptor 2. This protein plays a role in inflammation and as a chemoattractant for neutrophils. Aberrant expression of this protein is associated with the growth and progression of certain tumors. A naturally occurring processed form of this protein has increased chemotactic activity. Alternate splicing results in coding and non-coding variants of this gene. A pseudogene of this gene is found on chromosome 4. C-X-C motif chemokine ligand 1 2919 NA
TOX2 ENSG00000124191 NA TOX high mobility group box family member 2 84969 NA
KLF10 ENSG00000155090 This gene encodes a member of a family of proteins that feature C2H2-type zinc finger domains. The encoded protein is a transcriptional repressor that acts as an effector of transforming growth factor beta signaling. Activity of this protein may inhibit the growth of cancers, particularly pancreatic cancer. Alternative splicing results in multiple transcript variants. Kruppel like factor 10 7071 NA
SLCO4A1-AS1 ENSG00000232803 NA SLCO4A1 antisense RNA 1 100127888 NA
COX7A1 ENSG00000161281 Cytochrome c oxidase (COX), the terminal component of the mitochondrial respiratory chain, catalyzes the electron transfer from reduced cytochrome c to oxygen. This component is a heteromeric complex consisting of 3 catalytic subunits encoded by mitochondrial genes and multiple structural subunits encoded by nuclear genes. The mitochondrially-encoded subunits function in electron transfer, and the nuclear-encoded subunits may function in the regulation and assembly of the complex. This nuclear gene encodes polypeptide 1 (muscle isoform) of subunit VIIa and the polypeptide 1 is present only in muscle tissues. Other polypeptides of subunit VIIa are present in both muscle and nonmuscle tissues, and are encoded by different genes. cytochrome c oxidase subunit 7A1 1346 NA
FAM107A ENSG00000168309 NA family with sequence similarity 107 member A 11170 NA
NA ENSG00000267473 NA NA NA TRUE
SEZ6L2 ENSG00000174938 This gene encodes a seizure-related protein that is localized on the cell surface. The gene is located in a region of chromosome 16p11.2 that is thought to contain candidate genes for autism spectrum disorders (ASD), though there is no evidence directly implicating this gene in ASD. Increased expression of this gene has been found in lung cancers, and the protein is therefore considered to be a novel prognostic marker for lung cancer. Alternative splicing of this gene results in multiple transcript variants. seizure related 6 homolog like 2 26470 NA
TMEM178A ENSG00000152154 NA transmembrane protein 178A 130733 NA
GPD1 ENSG00000167588 This gene encodes a member of the NAD-dependent glycerol-3-phosphate dehydrogenase family. The encoded protein plays a critical role in carbohydrate and lipid metabolism by catalyzing the reversible conversion of dihydroxyacetone phosphate (DHAP) and reduced nicotine adenine dinucleotide (NADH) to glycerol-3-phosphate (G3P) and NAD+. The encoded cytosolic protein and mitochondrial glycerol-3-phosphate dehydrogenase also form a glycerol phosphate shuttle that facilitates the transfer of reducing equivalents from the cytosol to mitochondria. Mutations in this gene are a cause of transient infantile hypertriglyceridemia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. glycerol-3-phosphate dehydrogenase 1 2819 NA
SMIM5 ENSG00000204323 NA small integral membrane protein 5 643008 NA
CA3 ENSG00000164879 Carbonic anhydrase III (CAIII) is a member of a multigene family (at least six separate genes are known) that encodes carbonic anhydrase isozymes. These carbonic anhydrases are a class of metalloenzymes that catalyze the reversible hydration of carbon dioxide and are differentially expressed in a number of cell types. The expression of the CA3 gene is strictly tissue specific and present at high levels in skeletal muscle and much lower levels in cardiac and smooth muscle. A proportion of carriers of Duchenne muscle dystrophy have a higher CA3 level than normal. The gene spans 10.3 kb and contains seven exons and six introns. carbonic anhydrase 3 761 NA
JPH1 ENSG00000104369 Junctional complexes between the plasma membrane and endoplasmic/sarcoplasmic reticulum are a common feature of all excitable cell types and mediate cross talk between cell surface and intracellular ion channels. The protein encoded by this gene is a component of junctional complexes and is composed of a C-terminal hydrophobic segment spanning the endoplasmic/sarcoplasmic reticulum membrane and a remaining cytoplasmic domain that shows specific affinity for the plasma membrane. This gene is a member of the junctophilin gene family. junctophilin 1 56704 NA
SLC25A22 ENSG00000177542 This gene encodes a mitochondrial glutamate carrier. Mutations in this gene are associated with early infantile epileptic encephalopathy. Multiple alternatively spliced variants, encoding the same protein, have been identified. solute carrier family 25 member 22 79751 NA
FAM153B ENSG00000182230 NA family with sequence similarity 153 member B 202134 NA
LOC100507387 ENSG00000182230 NA uncharacterized LOC100507387 100507387 NA
RP11-490M8.1 ENSG00000260025 NA NA ENSG00000260025 NA
ICAM5 ENSG00000105376 The protein encoded by this gene is a member of the intercellular adhesion molecule (ICAM) family. All ICAM proteins are type I transmembrane glycoproteins, contain 2-9 immunoglobulin-like C2-type domains, and bind to the leukocyte adhesion LFA-1 protein. This protein is expressed on the surface of telencephalic neurons and displays two types of adhesion activity, homophilic binding between neurons and heterophilic binding between neurons and leukocytes. It may be a critical component in neuron-microglial cell interactions in the course of normal development or as part of neurodegenerative diseases. intercellular adhesion molecule 5 7087 NA
NOCT ENSG00000151014 The protein encoded by this gene is highly similar to Nocturnin, a gene identified as a circadian clock regulated gene in Xenopus laevis. This protein and Nocturnin protein share similarity with the C-terminal domain of a yeast transcription factor, carbon catabolite repression 4 (CCR4). The mRNA abundance of a similar gene in mouse has been shown to exhibit circadian rhythmicity, which suggests a role for this protein in clock function or as a circadian clock effector. nocturnin 25819 NA
CYP27B1 ENSG00000111012 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. The protein encoded by this gene localizes to the inner mitochondrial membrane where it hydroxylates 25-hydroxyvitamin D3 at the 1alpha position. This reaction synthesizes 1alpha,25-dihydroxyvitamin D3, the active form of vitamin D3, which binds to the vitamin D receptor and regulates calcium metabolism. Thus this enzyme regulates the level of biologically active vitamin D and plays an important role in calcium homeostasis. Mutations in this gene can result in vitamin D-dependent rickets type I. cytochrome P450 family 27 subfamily B member 1 1594 NA
CA3-AS1 ENSG00000253549 NA CA3 antisense RNA 1 100996348 NA
BNIPL ENSG00000163141 The protein encoded by this gene interacts with several other proteins, such as BCL2, ARHGAP1, MIF and GFER. It may function as a bridge molecule between BCL2 and ARHGAP1/CDC42 in promoting cell death. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. BCL2/adenovirus E1B 19kD interacting protein like 149428 NA
KCNIP4 ENSG00000185774 This gene encodes a member of the family of voltage-gated potassium (Kv) channel-interacting proteins (KCNIPs), which belong to the recoverin branch of the EF-hand superfamily. Members of the KCNIP family are small calcium binding proteins. They all have EF-hand-like domains, and differ from each other in the N-terminus. They are integral subunit components of native Kv4 channel complexes. They may regulate A-type currents, and hence neuronal excitability, in response to changes in intracellular calcium. This protein member also interacts with presenilin. Multiple alternatively spliced transcript variants encoding distinct isoforms have been identified for this gene. potassium voltage-gated channel interacting protein 4 80333 NA
ANK2 ENSG00000145362 This gene encodes a member of the ankyrin family of proteins that link the integral membrane proteins to the underlying spectrin-actin cytoskeleton. Ankyrins play key roles in activities such as cell motility, activation, proliferation, contact and the maintenance of specialized membrane domains. Most ankyrins are typically composed of three structural domains: an amino-terminal domain containing multiple ankyrin repeats; a central region with a highly conserved spectrin binding domain; and a carboxy-terminal regulatory domain which is the least conserved and subject to variation. The protein encoded by this gene is required for targeting and stability of Na/Ca exchanger 1 in cardiomyocytes. Mutations in this gene cause long QT syndrome 4 and cardiac arrhythmia syndrome. Multiple transcript variants encoding different isoforms have been described. ankyrin 2, neuronal 287 NA
HSPH1 ENSG00000120694 NA heat shock protein family H (Hsp110) member 1 10808 NA
PRSS3 ENSG00000010438 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is expressed in the brain and pancreas and is resistant to common trypsin inhibitors. It is active on peptide linkages involving the carboxyl group of lysine or arginine. This gene is localized to the locus of T cell receptor beta variable orphans on chromosome 9. Four transcript variants encoding different isoforms have been described for this gene. protease, serine 3 5646 NA
LOC100129518 ENSG00000112096 NA uncharacterized LOC100129518 100129518 NA
SOD2 ENSG00000112096 This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 1. superoxide dismutase 2, mitochondrial 6648 NA
FAM171B ENSG00000144369 NA family with sequence similarity 171 member B 165215 NA
MT1X ENSG00000187193 NA metallothionein 1X 4501 NA
LIPE ENSG00000079435 The protein encoded by this gene has a long and a short form, generated by use of alternative translational start codons. The long form is expressed in steroidogenic tissues such as testis, where it converts cholesteryl esters to free cholesterol for steroid hormone production. The short form is expressed in adipose tissue, among others, where it hydrolyzes stored triglycerides to free fatty acids. lipase E, hormone sensitive type 3991 NA
ACACB ENSG00000076555 Acetyl-CoA carboxylase (ACC) is a complex multifunctional enzyme system. ACC is a biotin-containing enzyme which catalyzes the carboxylation of acetyl-CoA to malonyl-CoA, the rate-limiting step in fatty acid synthesis. ACC-beta is thought to control fatty acid oxidation by means of the ability of malonyl-CoA to inhibit carnitine-palmitoyl-CoA transferase I, the rate-limiting step in fatty acid uptake and oxidation by mitochondria. ACC-beta may be involved in the regulation of fatty acid oxidation, rather than fatty acid biosynthesis. There is evidence for the presence of two ACC-beta isoforms. acetyl-CoA carboxylase beta 32 NA
PCK1 ENSG00000124253 This gene is a main control point for the regulation of gluconeogenesis. The cytosolic enzyme encoded by this gene, along with GTP, catalyzes the formation of phosphoenolpyruvate from oxaloacetate, with the release of carbon dioxide and GDP. The expression of this gene can be regulated by insulin, glucocorticoids, glucagon, cAMP, and diet. Defects in this gene are a cause of cytosolic phosphoenolpyruvate carboxykinase deficiency. A mitochondrial isozyme of the encoded protein also has been characterized. phosphoenolpyruvate carboxykinase 1 5105 NA
DNM1 ENSG00000106976 This gene encodes a member of the dynamin subfamily of GTP-binding proteins. The encoded protein possesses unique mechanochemical properties used to tubulate and sever membranes, and is involved in clathrin-mediated endocytosis and other vesicular trafficking processes. Actin and other cytoskeletal proteins act as binding partners for the encoded protein, which can also self-assemble leading to stimulation of GTPase activity. More than sixty highly conserved copies of the 3’ region of this gene are found elsewhere in the genome, particularly on chromosomes Y and 15. Alternatively spliced transcript variants encoding different isoforms have been described. dynamin 1 1759 NA
PSD ENSG00000059915 This gene encodes a Plekstrin homology and SEC7 domains-containing protein that functions as a guanine nucleotide exchange factor. The encoded protein regulates signal transduction by activating ADP-ribosylation factor 6. Alternative splicing results in multiple transcript variants. pleckstrin and Sec7 domain containing 5662 NA
CEND1 ENSG00000184524 The protein encoded by this gene is a neuron-specific protein. The similar protein in pig enhances neuroblastoma cell differentiation in vitro and may be involved in neuronal differentiation in vivo. Multiple pseudogenes have been reported for this gene. cell cycle exit and neuronal differentiation 1 51286 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",4,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 5 Annotations

out <- mygene::queryMany(gene_list[5,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name query symbol summary X_id notfound
naked cuticle homolog 2 ENSG00000145506 NKD2 This gene encodes a member of a family of proteins that function as negative regulators of Wnt receptor signaling through interaction with Dishevelled family members. The encoded protein participates in the delivery of transforming growth factor alpha-containing vesicles to the cell membrane. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 85409 NA
apoptosis inducing factor, mitochondria associated 3 ENSG00000183773 AIFM3 NA 150209 NA
pleckstrin and Sec7 domain containing ENSG00000059915 PSD This gene encodes a Plekstrin homology and SEC7 domains-containing protein that functions as a guanine nucleotide exchange factor. The encoded protein regulates signal transduction by activating ADP-ribosylation factor 6. Alternative splicing results in multiple transcript variants. 5662 NA
Thy-1 cell surface antigen ENSG00000154096 THY1 This gene encodes a cell surface glycoprotein and member of the immunoglobulin superfamily of proteins. The encoded protein is involved in cell adhesion and cell communication in numerous cell types, but particularly in cells of the immune and nervous systems. The encoded protein is widely used as a marker for hematopoietic stem cells. This gene may function as a tumor suppressor in nasopharyngeal carcinoma. Alternative splicing results in multiple transcript variants. 7070 NA
butyrylcholinesterase ENSG00000114200 BCHE Mutant alleles at the BCHE locus are responsible for suxamethonium sensitivity. Homozygous persons sustain prolonged apnea after administration of the muscle relaxant suxamethonium in connection with surgical anesthesia. The activity of pseudocholinesterase in the serum is low and its substrate behavior is atypical. In the absence of the relaxant, the homozygote is at no known disadvantage. 590 NA
NA ENSG00000263873 RP11-334E6.12 NA ENSG00000263873 NA
tachykinin receptor 2 ENSG00000075073 TACR2 This gene belongs to a family of genes that function as receptors for tachykinins. Receptor affinities are specified by variations in the 5’-end of the sequence. The receptors belonging to this family are characterized by interactions with G proteins and 7 hydrophobic transmembrane regions. This gene encodes the receptor for the tachykinin neuropeptide substance K, also referred to as neurokinin A. 6865 NA
fucosyltransferase 2 ENSG00000176920 FUT2 The protein encoded by this gene is a Golgi stack membrane protein that is involved in the creation of a precursor of the H antigen, which is required for the final step in the soluble A and B antigen synthesis pathway. This gene is one of two encoding the galactoside 2-L-fucosyltransferase enzyme. Two transcript variants encoding the same protein have been found for this gene. 2524 NA
S100 calcium binding protein A14 ENSG00000189334 S100A14 This gene encodes a member of the S100 protein family which contains an EF-hand motif and binds calcium. The gene is located in a cluster of S100 genes on chromosome 1. Levels of the encoded protein have been found to be lower in cancerous tissue and associated with metastasis suggesting a tumor suppressor function (PMID: 19956863, 19351828). 57402 NA
peptidyl arginine deiminase 2 ENSG00000117115 PADI2 This gene encodes a member of the peptidyl arginine deiminase family of enzymes, which catalyze the post-translational deimination of proteins by converting arginine residues into citrullines in the presence of calcium ions. The family members have distinct substrate specificities and tissue-specific expression patterns. The type II enzyme is the most widely expressed family member. Known substrates for this enzyme include myelin basic protein in the central nervous system and vimentin in skeletal muscle and macrophages. This enzyme is thought to play a role in the onset and progression of neurodegenerative human disorders, including Alzheimer disease and multiple sclerosis, and it has also been implicated in glaucoma pathogenesis. This gene exists in a cluster with four other paralogous genes. 11240 NA
NA ENSG00000257499 NA NA NA TRUE
serine peptidase inhibitor, Kunitz type 1 ENSG00000166145 SPINT1 The protein encoded by this gene is a member of the Kunitz family of serine protease inhibitors. The protein is a potent inhibitor specific for HGF activator and is thought to be involved in the regulation of the proteolytic activation of HGF in injured tissues. Alternative splicing results in multiple variants encoding different isoforms. 6692 NA
castor zinc finger 1 ENSG00000130940 CASZ1 The protein encoded by this gene is a zinc finger transcription factor. The encoded protein may function as a tumor suppressor, and single nucleotide polymorphisms in this gene are associated with blood pressure variation. Alternative splicing results in multiple transcript variants that encode different protein isoforms. 54897 NA
ArfGAP with GTPase domain, ankyrin repeat and PH domain 2 ENSG00000135439 AGAP2 The protein encoded by this gene belongs to the centaurin gamma-like family. It mediates anti-apoptotic effects of nerve growth factor by activating nuclear phosphoinositide 3-kinase. It is overexpressed in cancer cells, and promotes cancer cell invasion. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. 116986 NA
synemin ENSG00000182253 SYNM The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene. 23336 NA
ANO1 antisense RNA 1 ENSG00000254902 ANO1-AS1 NA ENSG00000254902 NA
family with sequence similarity 46 member B ENSG00000158246 FAM46B NA 115572 NA
galactosidase beta 1 like 2 ENSG00000149328 GLB1L2 NA 89944 NA
cytochrome P450 family 4 subfamily F member 29, pseudogene ENSG00000228314 CYP4F29P NA 54055 NA
protein tyrosine kinase 6 ENSG00000101213 PTK6 The protein encoded by this gene is a cytoplasmic nonreceptor protein kinase which may function as an intracellular signal transducer in epithelial tissues. Overexpression of this gene in mammary epithelial cells leads to sensitization of the cells to epidermal growth factor and results in a partially transformed phenotype. Expression of this gene has been detected at low levels in some breast tumors but not in normal breast tissue. The encoded protein has been shown to undergo autophosphorylation. Alternative splicing results in multiple transcript variants. 5753 NA
circadian associated repressor of transcription ENSG00000159208 CIART NA 148523 NA
calponin 1 ENSG00000130176 CNN1 NA 1264 NA
RAR related orphan receptor A ENSG00000069667 RORA The protein encoded by this gene is a member of the NR1 subfamily of nuclear hormone receptors. It can bind as a monomer or as a homodimer to hormone response elements upstream of several genes to enhance the expression of those genes. The encoded protein has been shown to interact with NM23-2, a nucleoside diphosphate kinase involved in organogenesis and differentiation, as well as with NM23-1, the product of a tumor metastasis suppressor candidate gene. Also, it has been shown to aid in the transcriptional regulation of some genes involved in circadian rhythm. Four transcript variants encoding different isoforms have been described for this gene. 6095 NA
transmembrane protein 52 ENSG00000178821 TMEM52 NA 339456 NA
early growth response 3 ENSG00000179388 EGR3 This gene encodes a transcriptional regulator that belongs to the EGR family of C2H2-type zinc-finger proteins. It is an immediate-early growth response gene which is induced by mitogenic stimulation. The protein encoded by this gene participates in the transcriptional regulation of genes in controling biological rhythm. It may also play a role in a wide variety of processes including muscle development, lymphocyte development, endothelial cell growth and migration, and neuronal development. Alternative splicing results in multiple transcript variants encoding distinct isoforms. 1960 NA
sorbin and SH3 domain containing 1 ENSG00000095637 SORBS1 This gene encodes a CBL-associated protein which functions in the signaling and stimulation of insulin. Mutations in this gene may be associated with human disorders of insulin resistance. Alternative splicing results in multiple transcript variants. 10580 NA
prominin 2 ENSG00000155066 PROM2 This gene encodes a member of the prominin family of pentaspan membrane glycoproteins. The encoded protein localizes to basal epithelial cells and may be involved in the organization of plasma membrane microdomains. Alternative splicing results in multiple transcript variants. 150696 NA
angiomotin like 1 ENSG00000166025 AMOTL1 The protein encoded by this gene is a peripheral membrane protein that is a component of tight junctions or TJs. TJs form an apical junctional structure and act to control paracellular permeability and maintain cell polarity. This protein is related to angiomotin, an angiostatin binding protein that regulates endothelial cell migration and capillary formation. Two transcript variants encoding different isoforms have been found for this gene. 154810 NA
dual oxidase 1 ENSG00000137857 DUOX1 The protein encoded by this gene is a glycoprotein and a member of the NADPH oxidase family. The synthesis of thyroid hormone is catalyzed by a protein complex located at the apical membrane of thyroid follicular cells. This complex contains an iodide transporter, thyroperoxidase, and a peroxide generating system that includes proteins encoded by this gene and the similar DUOX2 gene. This protein is known as dual oxidase because it has both a peroxidase homology domain and a gp91phox domain. This protein generates hydrogen peroxide and thereby plays a role in the activity of thyroid peroxidase, lactoperoxidase, and in lactoperoxidase-mediated antimicrobial defense at mucosal surfaces. Two alternatively spliced transcript variants encoding the same protein have been described for this gene. 53905 NA
NA ENSG00000261762 RP11-650L12.2 NA ENSG00000261762 NA
A-kinase anchoring protein 1 ENSG00000121057 AKAP1 The A-kinase anchor proteins (AKAPs) are a group of structurally diverse proteins, which have the common function of binding to the regulatory subunit of protein kinase A (PKA) and confining the holoenzyme to discrete locations within the cell. This gene encodes a member of the AKAP family. The encoded protein binds to type I and type II regulatory subunits of PKA and anchors them to the mitochondrion. This protein is speculated to be involved in the cAMP-dependent signal transduction pathway and in directing RNA to a specific cellular compartment. 8165 NA
tryptase alpha/beta 1 ENSG00000172236 TPSAB1 Tryptases comprise a family of trypsin-like serine proteases, the peptidase family S1. Tryptases are enzymatically active only as heparin-stabilized tetramers, and they are resistant to all known endogenous proteinase inhibitors. Several tryptase genes are clustered on chromosome 16p13.3. These genes are characterized by several distinct features. They have a highly conserved 3’ UTR and contain tandem repeat sequences at the 5’ flank and 3’ UTR which are thought to play a role in regulation of the mRNA stability. These genes have an intron immediately upstream of the initiator Met codon, which separates the site of transcription initiation from protein coding sequence. This feature is characteristic of tryptases but is unusual in other genes. The alleles of this gene exhibit an unusual amount of sequence variation, such that the alleles were once thought to represent two separate genes, alpha and beta 1. Beta tryptases appear to be the main isoenzymes expressed in mast cells; whereas in basophils, alpha tryptases predominate. Tryptases have been implicated as mediators in the pathogenesis of asthma and other allergic and inflammatory disorders. 7177 NA
NA ENSG00000267940 RP11-290F24.6 NA ENSG00000267940 NA
complexin 1 ENSG00000168993 CPLX1 Proteins encoded by the complexin/synaphin gene family are cytosolic proteins that function in synaptic vesicle exocytosis. These proteins bind syntaxin, part of the SNAP receptor. The protein product of this gene binds to the SNAP receptor complex and disrupts it, allowing transmitter release. 10815 NA
WAP four-disulfide core domain 3 ENSG00000124116 WFDC3 This gene encodes a member of the WAP-type four-disulfide core (WFDC) domain family. The WFDC domain, or WAP signature motif, contains eight cysteines forming four disulfide bonds at the core of the protein, and functions as a protease inhibitor. The encoded protein contains four WFDC domains. Most WFDC genes are localized to chromosome 20q12-q13 in two clusters: centromeric and telomeric. This gene belongs to the telomeric cluster. Alternatively spliced transcript variants have been observed but their full-length nature has not been determined. 140686 NA
NA ENSG00000261054 RP11-6O2.4 NA ENSG00000261054 NA
coiled-coil domain containing 85C ENSG00000205476 CCDC85C NA 317762 NA
NA ENSG00000256469 RP11-856F16.2 NA ENSG00000256469 NA
NA ENSG00000213144 RP11-64B16.2 NA ENSG00000213144 NA
transglutaminase 1 ENSG00000092295 TGM1 The protein encoded by this gene is a membrane protein that catalyzes the addition of an alkyl group from an akylamine to a glutamine residue of a protein, forming an alkylglutamine in the protein. This protein alkylation leads to crosslinking of proteins and catenation of polyamines to proteins. This gene contains either one or two copies of a 22 nt repeat unit in its 3’ UTR. Mutations in this gene have been associated with autosomal recessive lamellar ichthyosis (LI) and nonbullous congenital ichthyosiform erythroderma (NCIE). 7051 NA
BCL2 associated athanogene 3 ENSG00000151929 BAG3 BAG proteins compete with Hip for binding to the Hsc70/Hsp70 ATPase domain and promote substrate release. All the BAG proteins have an approximately 45-amino acid BAG domain near the C terminus but differ markedly in their N-terminal regions. The protein encoded by this gene contains a WW domain in the N-terminal region and a BAG domain in the C-terminal region. The BAG domains of BAG1, BAG2, and BAG3 interact specifically with the Hsc70 ATPase domain in vitro and in mammalian cells. All 3 proteins bind with high affinity to the ATPase domain of Hsc70 and inhibit its chaperone activity in a Hip-repressible manner. 9531 NA
podoplanin ENSG00000162493 PDPN This gene encodes a type-I integral membrane glycoprotein with diverse distribution in human tissues. The physiological function of this protein may be related to its mucin-type character. The homologous protein in other species has been described as a differentiation antigen and influenza-virus receptor. The specific function of this protein has not been determined but it has been proposed as a marker of lung injury. Alternatively spliced transcript variants encoding different isoforms have been identified. 10630 NA
myosin light chain kinase ENSG00000065534 MYLK This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. 4638 NA
NA ENSG00000261616 RP11-6O2.3 NA ENSG00000261616 NA
heat shock protein family B (small) member 8 ENSG00000152137 HSPB8 The protein encoded by this gene belongs to the superfamily of small heat-shock proteins containing a conservative alpha-crystallin domain at the C-terminal part of the molecule. The expression of this gene in induced by estrogen in estrogen receptor-positive breast cancer cells, and this protein also functions as a chaperone in association with Bag3, a stimulator of macroautophagy. Thus, this gene appears to be involved in regulation of cell proliferation, apoptosis, and carcinogenesis, and mutations in this gene have been associated with different neuromuscular diseases, including Charcot-Marie-Tooth disease. 26353 NA
MICAL like 1 ENSG00000100139 MICALL1 NA 85377 NA
coiled-coil domain containing 181 ENSG00000117477 CCDC181 NA 57821 NA
heat shock protein family A (Hsp70) member 2 ENSG00000126803 HSPA2 NA 3306 NA
retinoic acid receptor gamma ENSG00000172819 RARG This gene encodes a retinoic acid receptor that belongs to the nuclear hormone receptor family. Retinoic acid receptors (RARs) act as ligand-dependent transcriptional regulators. When bound to ligands, RARs activate transcription by binding as heterodimers to the retinoic acid response elements (RARE) found in the promoter regions of the target genes. In their unbound form, RARs repress transcription of their target genes. RARs are involved in various biological processes, including limb bud development, skeletal growth, and matrix homeostasis. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 5916 NA
transmembrane protein 132A ENSG00000006118 TMEM132A This gene encodes a protein that is highly similar to the rat Grp78-binding protein (GBP). Alternatively spliced transcript variants encoding different isoforms have been described. 54972 NA
mucin like 1 ENSG00000172551 MUCL1 NA 118430 NA
solute carrier family 25 member 25 ENSG00000148339 SLC25A25 The protein encoded by this gene belongs to the family of calcium-binding mitochondrial carriers, with a characteristic mitochondrial carrier domain at the C-terminus. These proteins are found in the inner membranes of mitochondria, and function as transport proteins. They shuttle metabolites, nucleotides and cofactors through the mitochondrial membrane and thereby connect and/or regulate cytoplasm and matrix functions. This protein may function as an ATP-Mg/Pi carrier that mediates the transport of Mg-ATP in exchange for phosphate, and likely responsible for the net uptake or efflux of adenine nucleotides into or from the mitochondria. Alternatively spliced transcript variants encoding different isoforms with a common C-terminus but variable N-termini have been described for this gene. 114789 NA
creatine kinase B ENSG00000166165 CKB The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in brain as well as in other tissues, and as a heterodimer with a similar muscle isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. A pseudogene of this gene has been characterized. 1152 NA
carnitine palmitoyltransferase 1C ENSG00000169169 CPT1C This gene encodes a member of the carnitine/choline acetyltransferase family. The encoded protein regulates the beta-oxidation and transport of long-chain fatty acids into mitochondria, and may play a role in the regulation of feeding behavior and whole-body energy homeostasis. Alternatively spliced transcript variants encoding multiple protein isoforms have been observed for this gene. 126129 NA
ankyrin repeat domain 9 ENSG00000156381 ANKRD9 NA 122416 NA
NA ENSG00000272986 RP11-46J23.1 NA ENSG00000272986 NA
uncharacterized LOC101929777 ENSG00000108379 LOC101929777 NA 101929777 NA
Wnt family member 3 ENSG00000108379 WNT3 The WNT gene family consists of structurally related genes which encode secreted signaling proteins. These proteins have been implicated in oncogenesis and in several developmental processes, including regulation of cell fate and patterning during embryogenesis. This gene is a member of the WNT gene family. It encodes a protein which shows 98% amino acid identity to mouse Wnt3 protein, and 84% to human WNT3A protein, another WNT gene product. The mouse studies show the requirement of Wnt3 in primary axis formation in the mouse. Studies of the gene expression suggest that this gene may play a key role in some cases of human breast, rectal, lung, and gastric cancer through activation of the WNT-beta-catenin-TCF signaling pathway. This gene is clustered with WNT15, another family member, in the chromosome 17q21 region. 7473 NA
NA ENSG00000272084 RP5-1126H10.2 NA ENSG00000272084 NA
adenylate kinase 7 ENSG00000140057 AK7 NA 122481 NA
TEA domain transcription factor 3 ENSG00000007866 TEAD3 This gene product is a member of the transcriptional enhancer factor (TEF) family of transcription factors, which contain the TEA/ATTS DNA-binding domain. It is predominantly expressed in the placenta and is involved in the transactivation of the chorionic somatomammotropin-B gene enhancer. Translation of this protein is initiated at a non-AUG (AUA) start codon. 7005 NA
TNF alpha induced protein 8 like 1 ENSG00000185361 TNFAIP8L1 NA 126282 NA
cytochrome P450 family 3 subfamily A member 5 ENSG00000106258 CYP3A5 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. The encoded protein metabolizes drugs as well as the steroid hormones testosterone and progesterone. This gene is part of a cluster of cytochrome P450 genes on chromosome 7q21.1. Two pseudogenes of this gene have been identified within this cluster on chromosome 7. Expression of this gene is widely variable among populations, and a single nucleotide polymorphism that affects transcript splicing has been associated with susceptibility to hypertensions. Alternative splicing results in multiple transcript variants. 1577 NA
5’-nucleotidase domain containing 3 ENSG00000111696 NT5DC3 NA 51559 NA
neurexophilin 3 ENSG00000182575 NXPH3 NA 11248 NA
PGAM family member 5, mitochondrial serine/threonine protein phosphatase ENSG00000247077 PGAM5 NA 192111 NA
V-set and immunoglobulin domain containing 2 ENSG00000019102 VSIG2 NA 23584 NA
NA ENSG00000262877 RP11-1055B8.4 NA ENSG00000262877 NA
tuftelin 1 ENSG00000143367 TUFT1 Tuftelin is an acidic protein that is thought to play a role in dental enamel mineralization and is implicated in caries susceptibility. It is also thought to be involved with adaptation to hypoxia, mesenchymal stem cell function, and neurotrophin nerve growth factor mediated neuronal differentiation. 7286 NA
malignant fibrous histiocytoma amplified sequence 1 ENSG00000147324 MFHAS1 Identified in a human 8p amplicon, this gene is a potential oncogene whose expression is enhanced in some malignant fibrous histiocytomas (MFH). The primary structure of its product includes an ATP/GTP-binding site, three leucine zipper domains, and a leucine-rich tandem repeat, which are structural or functional elements for interactions among proteins related to the cell cycle, and which suggest that overexpression might be oncogenic with respect to MFH. 9258 NA
transmembrane protein 79 ENSG00000163472 TMEM79 NA 84283 NA
DnaJ heat shock protein family (Hsp40) member B5 ENSG00000137094 DNAJB5 DNAJB5 belongs to the evolutionarily conserved DNAJ/HSP40 protein family. For background information on the DNAJ family, see MIM 608375. 25822 NA
NA ENSG00000260911 RP11-196G11.2 NA ENSG00000260911 NA
colony stimulating factor 3 ENSG00000108342 CSF3 The protein encoded by this gene is a cytokine that controls the production, differentiation, and function of granulocytes. The active protein is found extracellularly. Alternatively spliced transcript variants have been described for this gene. 1440 NA
tetraspanin 5 ENSG00000168785 TSPAN5 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. 10098 NA
NA ENSG00000263335 AF001548.5 NA ENSG00000263335 NA
interleukin 34 ENSG00000157368 IL34 Interleukin-34 is a cytokine that promotes the differentiation and viability of monocytes and macrophages through the colony-stimulating factor-1 receptor (CSF1R; MIM 164770) (Lin et al., 2008 [PubMed 18467591]). 146433 NA
cholinergic receptor nicotinic alpha 5 subunit ENSG00000169684 CHRNA5 The protein encoded by this gene is a nicotinic acetylcholine receptor subunit and a member of a superfamily of ligand-gated ion channels that mediate fast signal transmission at synapses. These receptors are thought to be heteropentamers composed of separate but similar subunits. Defects in this gene have been linked to susceptibility to lung cancer type 2 (LNCR2). 1138 NA
major facilitator superfamily domain containing 2A ENSG00000168389 MFSD2A NA 84879 NA
SERPINE1 mRNA binding protein 1 pseudogene 3 ENSG00000242142 SERBP1P3 NA ENSG00000242142 NA
NA ENSG00000182319 NA NA NA TRUE
hes family bHLH transcription factor 6 ENSG00000144485 HES6 This gene encodes a member of a subfamily of basic helix-loop-helix transcription repressors that have homology to the Drosophila enhancer of split genes. Members of this gene family regulate cell differentiation in numerous cell types. The protein encoded by this gene functions as a cofactor, interacting with other transcription factors through a tetrapeptide domain in its C-terminus. Alternatively spliced transcript variants encoding different isoforms have been described. 55502 NA
repulsive guidance molecule family member b ENSG00000174136 RGMB RGMB is a glycosylphosphatidylinositol (GPI)-anchored member of the repulsive guidance molecule family (see RGMA, MIM 607362) and contributes to the patterning of the developing nervous system (Samad et al., 2005 [PubMed 15671031]). 285704 NA
NA ENSG00000270605 RP5-1092A3.4 NA ENSG00000270605 NA
synuclein alpha interacting protein ENSG00000064692 SNCAIP This gene encodes a protein containing several protein-protein interaction domains, including ankyrin-like repeats, a coiled-coil domain, and an ATP/GTP-binding motif. The encoded protein interacts with alpha-synuclein in neuronal tissue and may play a role in the formation of cytoplasmic inclusions and neurodegeneration. A mutation in this gene has been associated with Parkinson’s disease. Alternative splicing results in multiple transcript variants. 9627 NA
solute carrier family 45 member 3 ENSG00000158715 SLC45A3 NA 85414 NA
prostaglandin E synthase 3 (cytosolic)-like ENSG00000267060 PTGES3L NA 100885848 NA
solute carrier family 7 member 5 ENSG00000103257 SLC7A5 NA 8140 NA
CD200 receptor 1 ENSG00000163606 CD200R1 This gene encodes a receptor for the OX-2 membrane glycoprotein. Both the receptor and substrate are cell surface glycoproteins containing two immunoglobulin-like domains. This receptor is restricted to the surfaces of myeloid lineage cells and the receptor-substrate interaction may function as a myeloid downregulatory signal. Mouse studies of a related gene suggest that this interaction may control myeloid function in a tissue-specific manner. Alternative splicing of this gene results in multiple transcript variants. 131450 NA
macrophage stimulating 1-like ENSG00000186715 MST1L NA ENSG00000186715 NA
NA ENSG00000260466 RP4-536B24.2 NA ENSG00000260466 NA
LSM11, U7 small nuclear RNA associated ENSG00000155858 LSM11 NA 134353 NA
plasminogen-like B1 ENSG00000183281 PLGLB1 NA 5343 NA
oligodendrocyte myelin glycoprotein ENSG00000126861 OMG NA 4974 NA
uncharacterized LOC102723927 ENSG00000261186 LOC102723927 NA 102723927 NA
ribosomal protein S20 pseudogene 21 ENSG00000244295 RPS20P21 NA ENSG00000244295 NA
natural killer cell cytotoxicity receptor 3 ligand 1 ENSG00000188211 NCR3LG1 B7H6 belongs to the B7 family (see MIM 605402) and is selectively expressed on tumor cells. Interaction of B7H6 with NKp30 (NCR3; MIM 611550) results in natural killer (NK) cell activation and cytotoxicity (Brandt et al., 2009 [PubMed 19528259]). 374383 NA
calpain 5 ENSG00000149260 CAPN5 Calpains are calcium-dependent cysteine proteases involved in signal transduction in a variety of cellular processes. A functional calpain protein consists of an invariant small subunit and 1 of a family of large subunits. CAPN5 is one of the large subunits. Unlike some of the calpains, CAPN5 and CAPN6 lack a calmodulin-like domain IV. Because of the significant similarity to Caenorhabditis elegans sex determination gene tra-3, CAPN5 is also called as HTRA3. 726 NA
keratin 19 ENSG00000171345 KRT19 The protein encoded by this gene is a member of the keratin family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. The type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. Unlike its related family members, this smallest known acidic cytokeratin is not paired with a basic cytokeratin in epithelial cells. It is specifically expressed in the periderm, the transiently superficial layer that envelopes the developing epidermis. The type I cytokeratins are clustered in a region of chromosome 17q12-q21. 3880 NA
period circadian clock 2 ENSG00000132326 PER2 This gene is a member of the Period family of genes and is expressed in a circadian pattern in the suprachiasmatic nucleus, the primary circadian pacemaker in the mammalian brain. Genes in this family encode components of the circadian rhythms of locomotor activity, metabolism, and behavior. This gene is upregulated by CLOCK/ARNTL heterodimers but then represses this upregulation in a feedback loop using PER/CRY heterodimers to interact with CLOCK/ARNTL. Polymorphisms in this gene may increase the risk of getting certain cancers and have been linked to sleep disorders. 8864 NA
NA ENSG00000267194 RP1-193H18.2 NA ENSG00000267194 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",5,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 6 Annotations

out <- mygene::queryMany(gene_list[6,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary X_id query symbol name notfound
The protein encoded by this gene is a cytokine that controls the production, differentiation, and function of granulocytes. The active protein is found extracellularly. Alternatively spliced transcript variants have been described for this gene. 1440 ENSG00000108342 CSF3 colony stimulating factor 3 NA
Activins are dimeric growth and differentiation factors which belong to the transforming growth factor-beta (TGF-beta) superfamily of structurally related signaling proteins. Activins signal through a heteromeric complex of receptor serine kinases which include at least two type I ( I and IB) and two type II (II and IIB) receptors. These receptors are all transmembrane proteins, composed of a ligand-binding extracellular domain with cysteine-rich region, a transmembrane domain, and a cytoplasmic domain with predicted serine/threonine specificity. Type I receptors are essential for signaling; and type II receptors are required for binding ligands and for expression of type I receptors. Type I and II receptors form a stable complex after ligand binding, resulting in phosphorylation of type I receptors by type II receptors. This gene encodes activin A type I receptor which signals a particular transcriptional response in concert with activin type II receptors. Mutations in this gene are associated with fibrodysplasia ossificans progressive. 90 ENSG00000115170 ACVR1 activin A receptor type 1 NA
The full-length protein encoded by this gene is an intracellular tetrapyrrole-binding protein. This protein includes a natural chemoattractant peptide of 21 amino acids at the N-terminus, which is a natural ligand for formyl peptide receptor-like receptor 2 (FPRL2) and promotes calcium mobilization and chemotaxis in monocytes and dendritic cells. 50865 ENSG00000013583 HEBP1 heme binding protein 1 NA
This gene encodes the insulin receptor substrate 2, a cytoplasmic signaling molecule that mediates effects of insulin, insulin-like growth factor 1, and other cytokines by acting as a molecular adaptor between diverse receptor tyrosine kinases and downstream effectors. The product of this gene is phosphorylated by the insulin receptor tyrosine kinase upon receptor stimulation, as well as by an interleukin 4 receptor-associated kinase in response to IL4 treatment. 8660 ENSG00000185950 IRS2 insulin receptor substrate 2 NA
This gene encodes a member of the low-density lipoprotein receptor family of proteins. The encoded preproprotein is proteolytically processed by furin to generate 515 kDa and 85 kDa subunits that form the mature receptor (PMID: 8546712). This receptor is involved in several cellular processes, including intracellular signaling, lipid homeostasis, and clearance of apoptotic cells. In addition, the encoded protein is necessary for the alpha 2-macroglobulin-mediated clearance of secreted amyloid precursor protein and beta-amyloid, the main component of amyloid plaques found in Alzheimer patients. Expression of this gene decreases with age and has been found to be lower than controls in brain tissue from Alzheimer’s disease patients. 4035 ENSG00000123384 LRP1 LDL receptor related protein 1 NA
NA NA ENSG00000255813 NA NA TRUE
This gene is thought to play an important role in calcium homeostasis. The gene is expressed from two promoters and undergoes extensive alternative splicing. The encoded set of proteins share varying amounts of overlap near their N-termini but have substantial variations in their C-terminal domains resulting in distinct functional properties. The longest isoforms (a and f) include a C-terminal Aspartyl/Asparaginyl beta-hydroxylase domain that hydroxylates aspartic acid or asparagine residues in the epidermal growth factor (EGF)-like domains of some proteins, including protein C, coagulation factors VII, IX, and X, and the complement factors C1R and C1S. Other isoforms differ primarily in the C-terminal sequence and lack the hydroxylase domain, and some have been localized to the endoplasmic and sarcoplasmic reticulum. Some of these isoforms are found in complexes with calsequestrin, triadin, and the ryanodine receptor, and have been shown to regulate calcium release from the sarcoplasmic reticulum. Some isoforms have been implicated in metastasis. 444 ENSG00000198363 ASPH aspartate beta-hydroxylase NA
The protein encoded by this gene is a member of the ankyrin repeat and SOCS box-containing (ASB) family of proteins. They contain ankyrin repeat sequence and a SOCS box domain. The SOCS box serves to couple suppressor of cytokine signalling (SOCS) proteins and their binding partners with the elongin B and C complex, possibly targeting them for degradation. Multiple alternatively spliced transcript variants, both protein-coding and not protein-coding, have been described for this gene. 79754 ENSG00000196372 ASB13 ankyrin repeat and SOCS box containing 13 NA
The RAB5 protein is a small GTPase involved in membrane trafficking in the early endocytic pathway. The protein encoded by this gene binds the GTP-bound form of the RAB5 protein preferentially over the GDP-bound form, and functions as a guanine nucleotide exchange factor for RAB5. The encoded protein is found primarily as a tetramer in the cytoplasm and does not bind other members of the RAB family. Mutations in this gene cause macrocephaly alopecia cutis laxa and scoliosis (MACS) syndrome, an elastic tissue disorder, as well as the related connective tissue disorder, RIN2 syndrome. Alternative splicing results in multiple transcript variants. 54453 ENSG00000132669 RIN2 Ras and Rab interactor 2 NA
NA ENSG00000233547 ENSG00000233547 RP11-57H14.2 NA NA
NA 83699 ENSG00000198478 SH3BGRL2 SH3 domain binding glutamate rich protein like 2 NA
This gene encodes a member of the thioredoxin family of enzymes. It is a cytosolic and ubiquitously expressed flavoprotein that catalyzes the two-electron reduction of quinone substrates and uses dihydronicotinamide riboside as a reducing coenzyme. Mutations in this gene have been associated with neurodegenerative diseases and several cancers. Alternative splicing results in multiple transcript variants. 4835 ENSG00000124588 NQO2 NAD(P)H quinone dehydrogenase 2 NA
NA 115548 ENSG00000157107 FCHO2 FCH domain only 2 NA
NA 9788 ENSG00000170873 MTSS1 metastasis suppressor 1 NA
The protein encoded by this gene contains a RING finger motif and is similar to g1, a Drosophila zinc-finger protein that is expressed in mesoderm and involved in embryonic development. The expression of the mouse counterpart was found to be upregulated in myeloblastic cells following IL3 deprivation, suggesting that this gene may regulate growth factor withdrawal-induced apoptosis of myeloid precursor cells. Alternative splicing results in multiple transcript variants. 55819 ENSG00000113269 RNF130 ring finger protein 130 NA
The protein encoded by this gene belongs to a small group of evolutionarily conserved proteins with three transmembrane domains. It is a potential target for ubiquitination by the Nedd4 family of proteins. This protein is thought to be part of a family of integral Golgi membrane proteins. 80762 ENSG00000131507 NDFIP1 Nedd4 family interacting protein 1 NA
This gene is a member of the MAD gene family . The MAD genes encode basic helix-loop-helix-leucine zipper proteins that heterodimerize with MAX protein, forming a transcriptional repression complex. The MAD proteins compete for MAX binding with MYC, which heterodimerizes with MAX forming a transcriptional activation complex. Studies in rodents suggest that the MAD genes are tumor suppressors and contribute to the regulation of cell growth in differentiating tissues. 10608 ENSG00000123933 MXD4 MAX dimerization protein 4 NA
NA ENSG00000263640 ENSG00000263640 AF235103.1 NA NA
NA 146547 ENSG00000178226 PRSS36 protease, serine 36 NA
NA ENSG00000267543 ENSG00000267543 RP11-666A8.7 NA NA
This gene encodes a threonine synthase-like protein. A similar enzyme in mouse can catalyze the degradation of O-phospho-homoserine to a-ketobutyrate, phosphate, and ammonia. This protein also has phospho-lyase activity on both gamma and beta phosphorylated substrates. In mouse an alternatively spliced form of this protein has been shown to act as a cytokine and can induce the production of the inflammatory cytokine IL6 in osteoblasts. Alternate splicing results in multiple transcript variants. 55258 ENSG00000144115 THNSL2 threonine synthase like 2 NA
The protein encoded by this gene is essential for bone resorption, and may play a critical role in vesicular transport in the osteoclast. Mutations in this gene are associated with autosomal recessive osteopetrosis type 6 (OPTB6). Alternatively spliced transcript variants have been found for this gene. 9842 ENSG00000225190 PLEKHM1 pleckstrin homology and RUN domain containing M1 NA
NA ENSG00000260306 ENSG00000260306 RP11-645C24.5 NA NA
NA NA ENSG00000264043 NA NA TRUE
This gene encodes a member of the growth arrest-specific 2 protein family. This protein binds components of the cytoskeleton and may be involved in mediating interactions between microtubules and microfilaments. Alternate splicing results in multiple transcript variants. A pseudogene of this gene is found on chromosome 9. 10634 ENSG00000185340 GAS2L1 growth arrest specific 2 like 1 NA
This gene is an ortholog of the C. elegans unc-76 gene, which is necessary for normal axonal bundling and elongation within axon bundles. Other orthologs include the rat gene that encodes zygin II, which can bind to synaptotagmin. 9637 ENSG00000171055 FEZ2 fasciculation and elongation protein zeta 2 NA
NA 100507103 ENSG00000230537 LOC100507103 uncharacterized LOC100507103 NA
This gene encodes a member of the C1 family of peptidases. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate multiple protein products. These products include the cathepsin B light and heavy chains, which can dimerize to form the double chain form of the enzyme. This enzyme is a lysosomal cysteine protease with both endopeptidase and exopeptidase activity that may play a role in protein turnover. It is also known as amyloid precursor protein secretase and is involved in the proteolytic processing of amyloid precursor protein (APP). Incomplete proteolytic processing of APP has been suggested to be a causative factor in Alzheimer’s disease, the most common cause of dementia. Overexpression of the encoded protein has been associated with esophageal adenocarcinoma and other tumors. Multiple pseudogenes of this gene have been identified. 1508 ENSG00000164733 CTSB cathepsin B NA
This gene encodes a member of the pyruvate dehydrogenase kinase family. The encoded protein phosphorylates pyruvate dehydrogenase, down-regulating the activity of the mitochondrial pyruvate dehydrogenase complex. Overexpression of this gene may play a role in both cancer and diabetes. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 5164 ENSG00000005882 PDK2 pyruvate dehydrogenase kinase 2 NA
This locus encodes a guanine nucleotide-binding protein. The encoded protein, an alpha subunit in the Gq class, couples a seven-transmembrane domain receptor to activation of phospolipase C-beta. Mutations at this locus have been associated with problems in platelet activation and aggregation. A related pseudogene exists on chromosome 2. 2776 ENSG00000156052 GNAQ G protein subunit alpha q NA
This gene encodes a lysosomal protein that interacts with RAB7, a small GTPase that controls transport to endocytic degradative compartments. Studies using mutant forms of the two proteins suggest that this protein represents a downstream effector for RAB7, and both proteins act together in the regulation of late endocytic traffic. A unique region of this protein has also been shown to be involved in the regulation of lysosomal morphology. 83547 ENSG00000167705 RILP Rab interacting lysosomal protein NA
This gene encodes a pseudophosphatase and member of the myotubularin-related protein family. This gene maps within the CMT4B2 candidate region of chromosome 11p15 and mutations in this gene have been associated with Charcot-Marie-Tooth Disease, type 4B2. 81846 ENSG00000133812 SBF2 SET binding factor 2 NA
The protein encoded by this gene is a cytosolic protein which contains a phosphotyrosine binding (PTD) domain. The PTD domain has been found to interact with the cytoplasmic tail of the LDL receptor. Mutations in this gene lead to LDL receptor malfunction and cause the disorder autosomal recessive hypercholesterolaemia. 26119 ENSG00000157978 LDLRAP1 low density lipoprotein receptor adaptor protein 1 NA
NA 23129 ENSG00000004399 PLXND1 plexin D1 NA
NA ENSG00000261269 ENSG00000261269 RP11-389C8.2 NA NA
This gene encodes a member of the trypsin family of serine proteases. This protein is a secreted enzyme that is proposed to regulate the availability of insulin-like growth factors (IGFs) by cleaving IGF-binding proteins. It has also been suggested to be a regulator of cell growth. Variations in the promoter region of this gene are the cause of susceptibility to age-related macular degeneration type 7. 5654 ENSG00000166033 HTRA1 HtrA serine peptidase 1 NA
This gene encodes a serine/threonine protein kinase that localizes to mitochondria. It is thought to protect cells from stress-induced mitochondrial dysfunction. Mutations in this gene cause one form of autosomal recessive early-onset Parkinson disease. 65018 ENSG00000158828 PINK1 PTEN induced putative kinase 1 NA
NA NA ENSG00000256845 NA NA TRUE
Angiomotin is a protein that binds angiostatin, a circulating inhibitor of the formation of new blood vessels (angiogenesis). Angiomotin mediates angiostatin inhibition of endothelial cell migration and tube formation in vitro. The protein encoded by this gene is related to angiomotin and is a member of the motin protein family. Alternative splicing results in multiple transcript variants of this gene. 51421 ENSG00000114019 AMOTL2 angiomotin like 2 NA
NA 100505635 ENSG00000235033 LOC100505635 uncharacterized LOC100505635 NA
NA 26035 ENSG00000138604 GLCE glucuronic acid epimerase NA
This gene encodes a memberof the transient receptor potential (TRP) cation channel gene family. The transmembrane protein localizes to intracellular vesicular membranes including lysosomes, and functions in the late endocytic pathway and in the regulation of lysosomal exocytosis. The channel is permeable to Ca(2+), Fe(2+), Na(+), K(+), and H(+), and is modulated by changes in Ca(2+) concentration. Mutations in this gene result in mucolipidosis type IV. 57192 ENSG00000090674 MCOLN1 mucolipin 1 NA
NA 64798 ENSG00000155792 DEPTOR DEP domain containing MTOR-interacting protein NA
NA 404093 ENSG00000180891 CUEDC1 CUE domain containing 1 NA
This gene encodes a coiled-coil and calcium binding domain protein that appears to play a critical role in cilia formation. Mutations in this gene cause Meckel syndrome type 6, as well as Joubert syndrome type 9. Alternative splicing results in multiple transcript variants. 57545 ENSG00000048342 CC2D2A coiled-coil and C2 domain containing 2A NA
The protein encoded by this gene is a member of the tripartite motif (TRIM) family. The TRIM motif includes three zinc-binding domains, a RING, a B-box type 1 and a B-box type 2, and a coiled-coil region. The protein localizes to cytoplasmic filaments. It plays a neuroprotective role and functions as an E3-ubiquitin ligase in proteasome-mediated degradation of target proteins. Mutations in this gene can cause early-onset axonal neuropathy. Alternative splicing results in multiple transcript variants. 23321 ENSG00000109654 TRIM2 tripartite motif containing 2 NA
NA 146880 ENSG00000215769 LOC146880 Rho GTPase activating protein 27 pseudogene NA
NA 57037 ENSG00000106524 ANKMY2 ankyrin repeat and MYND domain containing 2 NA
NA ENSG00000267546 ENSG00000267546 RP11-666A8.8 NA NA
NA ENSG00000273219 ENSG00000273219 RP11-644N4.1 NA NA
This gene encodes a membrane-bound protein from the major facilitator superfamily of transporters. Disruption of this gene by translocation has been associated with haplo-insufficiency and renal cell carcinomas. Alternatively spliced transcript variants have been described, but their biological validity has not yet been determined. 84925 ENSG00000138463 DIRC2 disrupted in renal carcinoma 2 NA
The protein encoded by this gene is a DNA-binding, leucine zipper-containing transcription factor that acts as a homodimer or as a heterodimer. Depending on the binding site and binding partner, the encoded protein can be a transcriptional activator or repressor. This protein plays a role in the regulation of several cellular processes, including embryonic lens fiber cell development, increased T-cell susceptibility to apoptosis, and chondrocyte terminal differentiation. Defects in this gene are a cause of juvenile-onset pulverulent cataract as well as congenital cerulean cataract 4 (CCA4). Two transcript variants encoding different isoforms have been found for this gene. 4094 ENSG00000178573 MAF MAF bZIP transcription factor NA
NA 54621 ENSG00000176834 VSIG10 V-set and immunoglobulin domain containing 10 NA
Rho GTPases play a fundamental role in numerous cellular processes that are initiated by extracellular stimuli that work through G protein coupled receptors. The encoded protein may form a complex with G proteins and stimulate Rho-dependent signals. A similar protein in rat interacts with glutamate transporter EAAT4 and modulates its glutamate transport activity. Expression of the rat protein induces the reorganization of the actin cytoskeleton and its overexpression induces the formation of membrane ruffling and filopodia. Two alternative transcripts encoding different isoforms have been described. 9826 ENSG00000132694 ARHGEF11 Rho guanine nucleotide exchange factor 11 NA
NA 57333 ENSG00000142552 RCN3 reticulocalbin 3 NA
This gene encodes amyloid precursor- like protein 2 (APLP2), which is a member of the APP (amyloid precursor protein) family including APP, APLP1 and APLP2. This protein is ubiquitously expressed. It contains heparin-, copper- and zinc- binding domains at the N-terminus, BPTI/Kunitz inhibitor and E2 domains in the middle region, and transmembrane and intracellular domains at the C-terminus. This protein interacts with major histocompatibility complex (MHC) class I molecules. The synergy of this protein and the APP is required to mediate neuromuscular transmission, spatial learning and synaptic plasticity. This protein has been implicated in the pathogenesis of Alzheimer’s disease. Multiple alternatively spliced transcript variants encoding different isoforms have been identified. 334 ENSG00000084234 APLP2 amyloid beta precursor like protein 2 NA
NA 149076 ENSG00000160094 ZNF362 zinc finger protein 362 NA
Prenylcysteine is released during the degradation of prenylated proteins. PCYOX1 catalyzes the degradation of prenylcysteine to yield free cysteines and a hydrophobic isoprenoid product (Tschantz et al., 1999 [PubMed 10585463]). 51449 ENSG00000116005 PCYOX1 prenylcysteine oxidase 1 NA
NA 285512 ENSG00000248019 FAM13A-AS1 FAM13A antisense RNA 1 NA
This gene encodes a cytosolic enzyme that catalyzes the activation of acetate for use in lipid synthesis and energy generation. The protein acts as a monomer and produces acetyl-CoA from acetate in a reaction that requires ATP. Expression of this gene is regulated by sterol regulatory element-binding proteins, transcription factors that activate genes required for the synthesis of cholesterol and unsaturated fatty acids. Alternative splicing results in multiple transcript variants. 55902 ENSG00000131069 ACSS2 acyl-CoA synthetase short-chain family member 2 NA
This gene encodes a deoxyribonucleoside kinase that specifically phosphorylates thymidine, deoxycytidine, and deoxyuridine. The encoded enzyme localizes to the mitochondria and is required for mitochondrial DNA synthesis. Mutations in this gene are associated with a myopathic form of mitochondrial DNA depletion syndrome. Alternate splicing results in multiple transcript variants encoding distinct isoforms, some of which lack transit peptide, so are not localized to mitochondria. 7084 ENSG00000166548 TK2 thymidine kinase 2, mitochondrial NA
NA 10079 ENSG00000054793 ATP9A ATPase phospholipid transporting 9A (putative) NA
NA 221442 ENSG00000161912 ADCY10P1 adenylate cyclase 10 (soluble) pseudogene 1 NA
The protein encoded by this gene is a type II integral membrane protein that belongs to the 3-O-sulfotransferases family. These proteins catalyze the addition of sulfate groups at the 3-OH position of glucosamine in heparan sulfate. The substrate specificity of individual members of the family is based on prior modification of the heparan sulfate chain, thus allowing different members of the family to generate binding sites for different proteins on the same heparan sulfate chain. Following treatment with a histone deacetylase inhibitor, expression of this gene is activated in a pancreatic cell line. The increased expression results in promotion of the epithelial-mesenchymal transition. In addition, the modification catalyzed by this protein allows herpes simplex virus membrane fusion and penetration. A very closely related homolog with an almost identical sulfotransferase domain maps less than 1 Mb away. Alternative splicing results in multiple transcript variants. 9953 ENSG00000125430 HS3ST3B1 heparan sulfate-glucosamine 3-sulfotransferase 3B1 NA
NA ENSG00000227201 ENSG00000227201 CNN2P1 calponin 2 pseudogene 1 NA
NA ENSG00000261064 ENSG00000261064 RP11-1000B6.3 NA NA
C6ORF49 is a member of the LIM domain protein family (Teufel et al., 2005 [PubMed 15702247]). 29964 ENSG00000124593 PRICKLE4 prickle planar cell polarity protein 4 NA
NA ENSG00000200278 ENSG00000200278 RNA5SP352 RNA, 5S ribosomal pseudogene 352 NA
NA 80221 ENSG00000167107 ACSF2 acyl-CoA synthetase family member 2 NA
NA NA ENSG00000230633 NA NA TRUE
This gene encodes the mitochondrial enzyme ornithine aminotransferase, which is a key enzyme in the pathway that converts arginine and ornithine into the major excitatory and inhibitory neurotransmitters glutamate and GABA. Mutations that result in a deficiency of this enzyme cause the autosomal recessive eye disease Gyrate Atrophy. Alternatively spliced transcript variants encoding different isoforms have been described. Related pseudogenes have been defined on the X chromosome. 4942 ENSG00000065154 OAT ornithine aminotransferase NA
This gene encodes a member of the SOX (SRY-related HMG-box) family of transcription factors involved in the regulation of embryonic development and in the determination of cell fate. The encoded protein may act as a transcriptional regulator after forming a protein complex with other proteins. It has also been determined to be a type-1 diabetes autoantigen, also known as islet cell antibody 12. 9580 ENSG00000143842 SOX13 SRY-box 13 NA
The protein encoded by this gene is a member of the gelsolin/villin family of actin regulatory proteins. This protein has structural similarity to villin. It binds actin and may play a role in the development of neuronal cells that form ganglia. 10677 ENSG00000135407 AVIL advillin NA
The protein encoded by this gene acts as a homodimer, using ATP hydrolysis to catalyze the conversion of 5-oxo-L-proline to L-glutamate. Defects in this gene are a cause of 5-oxoprolinase deficiency (OPLAHD). 26873 ENSG00000178814 OPLAH 5-oxoprolinase (ATP-hydrolysing) NA
NA 54884 ENSG00000042445 RETSAT retinol saturase NA
This gene encodes a cytoskeletal protein involved in actin-membrane attachment at sites of cell adhesion to the extracellular matrix (focal adhesion). Alternatively spliced transcript variants encoding different isoforms have been described for this gene. These isoforms exhibit different expression pattern, and have different biochemical, as well as physiological properties (PMID:9054445). 5829 ENSG00000089159 PXN paxillin NA
This gene encodes an enzyme which removes 9-O-acetylation modifications from sialic acids. Mutations in this gene are associated with susceptibility to autoimmune disease 6. Multiple transcript variants encoding different isoforms, found either in the cytosol or in the lysosome, have been found for this gene. 54414 ENSG00000110013 SIAE sialic acid acetylesterase NA
NA 57515 ENSG00000111897 SERINC1 serine incorporator 1 NA
NA ENSG00000255857 ENSG00000255857 PXN-AS1 PXN antisense RNA 1 NA
NA NA ENSG00000256142 NA NA TRUE
This gene belongs to the chemokine-like factor gene superfamily, a novel family that is similar to the chemokine and the transmembrane 4 superfamilies of signaling molecules. This gene is one of several chemokine-like factor genes located in a cluster on chromosome 16. Alternatively spliced transcript variants encoding different isoforms have been identified. 146223 ENSG00000183723 CMTM4 CKLF like MARVEL transmembrane domain containing 4 NA
NA 81553 ENSG00000197872 FAM49A family with sequence similarity 49 member A NA
NA NA ENSG00000272091 NA NA TRUE
The product of this gene belongs to the Serine/Threonine protein kinase family, and to the Ca(2+)/calmodulin-dependent protein kinase subfamily. The major isoform of this gene plays a role in the calcium/calmodulin-dependent (CaM) kinase cascade by phosphorylating the downstream kinases CaMK1 and CaMK4. Protein products of this gene also phosphorylate AMP-activated protein kinase (AMPK). This gene has its strongest expression in the brain and influences signalling cascades involved with learning and memory, neuronal differentiation and migration, neurite outgrowth, and synapse formation. Alternative splicing results in multiple transcript variants encoding distinct isoforms. The identified isoforms differ in their ability to undergo autophosphorylation and to phosphorylate downstream kinases. 10645 ENSG00000110931 CAMKK2 calcium/calmodulin-dependent protein kinase kinase 2 NA
NA ENSG00000269976 ENSG00000269976 RP11-130L8.2 NA NA
This gene encodes an alpha chain for one of the low abundance fibrillar collagens. Fibrillar collagen molecules are trimers that can be composed of one or more types of alpha chains. Type V collagen is found in tissues containing type I collagen and appears to regulate the assembly of heterotypic fibers composed of both type I and type V collagen. This gene product is closely related to type XI collagen and it is possible that the collagen chains of types V and XI constitute a single collagen type with tissue-specific chain combinations. Mutations in this gene are thought to be responsible for the symptoms of a subset of patients with Ehlers-Danlos syndrome type III. Messages of several sizes can be detected in northern blots but sequence information cannot confirm the identity of the shorter messages. 50509 ENSG00000080573 COL5A3 collagen type V alpha 3 NA
Syntrophins are cytoplasmic peripheral membrane scaffold proteins that are components of the dystrophin-associated protein complex. This gene is a member of the syntrophin gene family and encodes the most common syntrophin isoform found in cardiac tissues. The N-terminal PDZ domain of this syntrophin protein interacts with the C-terminus of the pore-forming alpha subunit (SCN5A) of the cardiac sodium channel Nav1.5. This protein also associates cardiac sodium channels with the nitric oxide synthase-PMCA4b (plasma membrane Ca-ATPase subtype 4b) complex in cardiomyocytes. This gene is a susceptibility locus for Long-QT syndrome (LQT) - an inherited disorder associated with sudden cardiac death from arrhythmia - and sudden infant death syndrome (SIDS). This protein also associates with dystrophin and dystrophin-related proteins at the neuromuscular junction and alters intracellular calcium ion levels in muscle tissue. 6640 ENSG00000101400 SNTA1 syntrophin alpha 1 NA
The protein encoded by this gene binds to the ‘plus’ ends of actin monomers and filaments to prevent monomer exchange. The encoded calcium-regulated protein functions in both assembly and disassembly of actin filaments. Defects in this gene are a cause of familial amyloidosis Finnish type (FAF). Multiple transcript variants encoding several different isoforms have been found for this gene. 2934 ENSG00000148180 GSN gelsolin NA
NA 55652 ENSG00000211584 SLC48A1 solute carrier family 48 member 1 NA
NA ENSG00000254317 ENSG00000254317 RP11-473O4.5 NA NA
NA ENSG00000237781 ENSG00000237781 RP11-54A4.2 NA NA
NA 146691 ENSG00000175662 TOM1L2 target of myb1 like 2 membrane trafficking protein NA
This gene encodes a phox (PX) domain-containing protein which may be involved in synaptic transmission and the ligand-induced internalization and degradation of epidermal growth factors. Variations in this gene may be associated with susceptibility to systemic lupus erythematosus (SLE). Alternative splicing results in multiple transcript variants. 54899 ENSG00000168297 PXK PX domain containing serine/threonine kinase like NA
This gene encodes a protein similar to guanosine nucleotide exchange factors for Rho GTPases. The encoded protein contains in its C-terminus a GEF domain involved in exchange activity and a pleckstrin homology domain. Alternatively spliced transcripts that encode different proteins have been described. 55701 ENSG00000165801 ARHGEF40 Rho guanine nucleotide exchange factor 40 NA
NA 11328 ENSG00000122642 FKBP9 FK506 binding protein 9 NA
This gene encodes a dual serine/threonine and tyrosine protein kinase which is expressed in multiple tissues. It is thought to function as a regulator of cell death. Multiple transcript variants encoding different isoforms have been found for this gene. 25778 ENSG00000133059 DSTYK dual serine/threonine and tyrosine protein kinase NA
This gene encodes a highly conserved preproprotein that is proteolytically processed to generate four main cleavage products including saposins A, B, C, and D. Each domain of the precursor protein is approximately 80 amino acid residues long with nearly identical placement of cysteine residues and glycosylation sites. Saposins A-D localize primarily to the lysosomal compartment where they facilitate the catabolism of glycosphingolipids with short oligosaccharide groups. The precursor protein exists both as a secretory protein and as an integral membrane protein and has neurotrophic activities. Mutations in this gene have been associated with Gaucher disease and metachromatic leukodystrophy. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. 5660 ENSG00000197746 PSAP prosaposin NA
NA 387680 ENSG00000099290 FAM21A family with sequence similarity 21 member A NA
NA 100134229 ENSG00000260231 JHDM1D-AS1 JHDM1D antisense RNA 1 (head to head) NA
NA 122953 ENSG00000140044 JDP2 Jun dimerization protein 2 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",6,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 7 Annotations

out <- mygene::queryMany(gene_list[7,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id name query summary notfound
TNNC1 7134 troponin C1, slow skeletal and cardiac type ENSG00000114854 Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z. NA
MYOZ2 51778 myozenin 2 ENSG00000172399 The protein encoded by this gene belongs to a family of sarcomeric proteins that bind to calcineurin, a phosphatase involved in calcium-dependent signal transduction in diverse cell types. These family members tether calcineurin to alpha-actinin at the z-line of the sarcomere of cardiac and skeletal muscle cells, and thus they are important for calcineurin signaling. Mutations in this gene cause cardiomyopathy familial hypertrophic type 16, a hereditary heart disorder. NA
EEF1A2 1917 eukaryotic translation elongation factor 1 alpha 2 ENSG00000101210 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 2) is expressed in brain, heart and skeletal muscle, and the other isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas. This gene may be critical in the development of ovarian cancer. NA
CRYM 1428 crystallin mu ENSG00000103316 Crystallins are separated into two classes: taxon-specific and ubiquitous. The former class is also called phylogenetically-restricted crystallins. The latter class constitutes the major proteins of vertebrate eye lens and maintains the transparency and refractive index of the lens. This gene encodes a taxon-specific crystallin protein that binds NADPH and has sequence similarity to bacterial ornithine cyclodeaminases. The encoded protein does not perform a structural role in lens tissue, and instead it binds thyroid hormone for possible regulatory or developmental roles. Mutations in this gene have been associated with autosomal dominant non-syndromic deafness. NA
PKP2 5318 plakophilin 2 ENSG00000057294 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This gene product may regulate the signaling activity of beta-catenin. Two alternately spliced transcripts encoding two protein isoforms have been identified. A processed pseudogene with high similarity to this locus has been mapped to chromosome 12p13. NA
HRC 3270 histidine rich calcium binding protein ENSG00000130528 This gene encodes a luminal sarcoplasmic reticulum protein identified by its ability to bind low-density lipoprotein with high affinity. The protein interacts with the cytoplasmic domain of triadin, the main transmembrane protein of the junctional sarcoplasmic reticulum (SR) of skeletal muscle. The protein functions in the regulation of releasable calcium into the SR. NA
FAM78A 286336 family with sequence similarity 78 member A ENSG00000126882 NA NA
TCAP 8557 titin-cap ENSG00000173991 Sarcomere assembly is regulated by the muscle protein titin. Titin is a giant elastic protein with kinase activity that extends half the length of a sarcomere. It serves as a scaffold to which myofibrils and other muscle related proteins are attached. This gene encodes a protein found in striated and cardiac muscle that binds to the titin Z1-Z2 domains and is a substrate of titin kinase, interactions thought to be critical to sarcomere assembly. Mutations in this gene are associated with limb-girdle muscular dystrophy type 2G. NA
WDR62 284403 WD repeat domain 62 ENSG00000075702 This gene is proposed to play a role in cerebral cortical development. Mutations in this gene have been associated with microencephaly, cortical malformations, and mental retardation. Alternative splicing results in multiple transcript variants. NA
PTGDS 5730 prostaglandin D2 synthase ENSG00000107317 The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. NA
ACTN2 88 actinin alpha 2 ENSG00000077522 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. NA
MYH7B 57644 myosin, heavy chain 7B, cardiac muscle, beta ENSG00000078814 The myosin II molecule is a multi-subunit complex consisting of two heavy chains and four light chains. This gene encodes a heavy chain of myosin II, which is a member of the motor-domain superfamily. The heavy chain includes a globular motor domain, which catalyzes ATP hydrolysis and interacts with actin, and a tail domain in which heptad repeat sequences promote dimerization by interacting to form a rod-like alpha-helical coiled coil. This heavy chain subunit is a slow-twitch myosin. Alternatively spliced transcript variants have been found, but the full-length nature of these variants is not determined. NA
ACTC1 70 actin, alpha, cardiac muscle 1 ENSG00000159251 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). NA
RP11-11N9.4 ENSG00000247134 NA ENSG00000247134 NA NA
VSTM2L 128434 V-set and transmembrane domain containing 2 like ENSG00000132821 NA NA
CASQ2 845 calsequestrin 2 ENSG00000118729 The protein encoded by this gene specifies the cardiac muscle family member of the calsequestrin family. Calsequestrin is localized to the sarcoplasmic reticulum in cardiac and slow skeletal muscle cells. The protein is a calcium binding protein that stores calcium for muscle function. Mutations in this gene cause stress-induced polymorphic ventricular tachycardia, also referred to as catecholaminergic polymorphic ventricular tachycardia 2 (CPVT2), a disease characterized by bidirectional ventricular tachycardia that may lead to cardiac arrest. NA
LDB3 11155 LIM domain binding 3 ENSG00000122367 This gene encodes a PDZ domain-containing protein. PDZ motifs are modular protein-protein interaction domains consisting of 80-120 amino acid residues. PDZ domain-containing proteins interact with each other in cytoskeletal assembly or with other proteins involved in targeting and clustering of membrane proteins. The protein encoded by this gene interacts with alpha-actinin-2 through its N-terminal PDZ domain and with protein kinase C via its C-terminal LIM domains. The LIM domain is a cysteine-rich motif defined by 50-60 amino acids containing two zinc-binding modules. This protein also interacts with all three members of the myozenin family. Mutations in this gene have been associated with myofibrillar myopathy and dilated cardiomyopathy. Alternatively spliced transcript variants encoding different isoforms have been identified; all isoforms have N-terminal PDZ domains while only longer isoforms (1, 2 and 5) have C-terminal LIM domains. NA
SLC2A4 6517 solute carrier family 2 member 4 ENSG00000181856 This gene is a member of the solute carrier family 2 (facilitated glucose transporter) family and encodes a protein that functions as an insulin-regulated facilitative glucose transporter. In the absence of insulin, this integral membrane protein is sequestered within the cells of muscle and adipose tissue. Within minutes of insulin stimulation, the protein moves to the cell surface and begins to transport glucose across the cell membrane. Mutations in this gene have been associated with noninsulin-dependent diabetes mellitus (NIDDM). NA
SHISA3 152573 shisa family member 3 ENSG00000178343 NA NA
TNFAIP6 7130 TNF alpha induced protein 6 ENSG00000123610 The protein encoded by this gene is a secretory protein that contains a hyaluronan-binding domain, and thus is a member of the hyaluronan-binding protein family. The hyaluronan-binding domain is known to be involved in extracellular matrix stability and cell migration. This protein has been shown to form a stable complex with inter-alpha-inhibitor (I alpha I), and thus enhance the serine protease inhibitory activity of I alpha I, which is important in the protease network associated with inflammation. This gene can be induced by proinflammatory cytokines such as tumor necrosis factor alpha and interleukin-1. Enhanced levels of this protein are found in the synovial fluid of patients with osteoarthritis and rheumatoid arthritis. NA
FNDC5 252995 fibronectin type III domain containing 5 ENSG00000160097 This gene encodes a secreted protein that is released from muscle cells during exercise. The encoded protein may participate in the development of brown fat. Translation of the precursor protein initiates at a non-AUG start codon at a position that is conserved as an AUG start codon in other organisms. Alternative splicing results in multiple transcript variants. NA
AC017116.11 ENSG00000239775 NA ENSG00000239775 NA NA
PI16 221476 peptidase inhibitor 16 ENSG00000164530 NA NA
LTK 4058 leukocyte receptor tyrosine kinase ENSG00000062524 The protein encoded by this gene is a member of the ros/insulin receptor family of tyrosine kinases. Tyrosine-specific phosphorylation of proteins is a key to the control of diverse pathways leading to cell growth and differentiation. Multiple transcript variants encoding different isoforms have been found for this gene. NA
FHOD3 80206 formin homology 2 domain containing 3 ENSG00000134775 The protein encoded by this gene is a member of the diaphanous-related formins (DRF), and contains multiple domains, including GBD (GTPase-binding domain), DID (diaphanous inhibitory domain), FH1 (formin homology 1), FH2 (formin homology 2), and DAD (diaphanous auto-regulatory domain) domains. This protein is thought to play a role in actin filament polymerization in cardiomyocytes. Mutations in this gene have been associated with dilated cardiomyopathy (DCM), characterized by dilation of the ventricular chamber, leading to impairment of systolic pump function and subsequent heart failure. Increased levels of the protein encoded by this gene have been observed in individuals with hypertrophic cardiomyopathy (HCM). Alternative splicing results in multiple transcript variants encoding different isoforms. A muscle-specific isoform has been shown to possess a casein kinase 2 (CK2) phosphorylation site at the C-terminal end of the FH2 domain. Phosphorylation of this site alters its interaction with sequestosome 1 (SQSTM1), and targets this isoform to myofibrils, while other isoforms form cytoplasmic aggregates. NA
TSPAN32 10077 tetraspanin 32 ENSG00000064201 This gene, which is a member of the tetraspanin superfamily, is one of several tumor-suppressing subtransferable fragments located in the imprinted gene domain of chromosome 11p15.5, an important tumor-suppressor gene region. Alterations in this region have been associated with Beckwith-Wiedemann syndrome, Wilms tumor, rhabdomyosarcoma, adrenocortical carcinoma, and lung, ovarian and breast cancers. This gene is located among several imprinted genes; however, this gene, as well as the tumor-suppressing subchromosomal transferable fragment 4, escapes imprinting. This gene may play a role in malignancies and diseases that involve this region, and it is also involved in hematopoietic cell function. Alternatively spliced transcript variants have been described, but their biological validity has not been determined. NA
BCHE 590 butyrylcholinesterase ENSG00000114200 Mutant alleles at the BCHE locus are responsible for suxamethonium sensitivity. Homozygous persons sustain prolonged apnea after administration of the muscle relaxant suxamethonium in connection with surgical anesthesia. The activity of pseudocholinesterase in the serum is low and its substrate behavior is atypical. In the absence of the relaxant, the homozygote is at no known disadvantage. NA
LPL 4023 lipoprotein lipase ENSG00000175445 LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. NA
CELF2-AS1 414196 CELF2 antisense RNA 1 ENSG00000181800 NA NA
TNFRSF19 55504 tumor necrosis factor receptor superfamily member 19 ENSG00000127863 The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptor is highly expressed during embryonic development. It has been shown to interact with TRAF family members, and to activate JNK signaling pathway when overexpressed in cells. This receptor is capable of inducing apoptosis by a caspase-independent mechanism, and it is thought to play an essential role in embryonic development. Alternatively spliced transcript variants encoding distinct isoforms have been described. NA
MLF1 4291 myeloid leukemia factor 1 ENSG00000178053 This gene encodes an oncoprotein which is thought to play a role in the phenotypic determination of hemopoetic cells. Translocations between this gene and nucleophosmin have been associated with myelodysplastic syndrome and acute myeloid leukemia. Multiple transcript variants encoding different isoforms have been found for this gene. NA
BLM 641 Bloom syndrome RecQ like helicase ENSG00000197299 The Bloom syndrome gene product is related to the RecQ subset of DExH box-containing DNA helicases and has both DNA-stimulated ATPase and ATP-dependent DNA helicase activities. Mutations causing Bloom syndrome delete or alter helicase motifs and may disable the 3’-5’ helicase activity. The normal protein may act to suppress inappropriate recombination. NA
TRPV3 162514 transient receptor potential cation channel subfamily V member 3 ENSG00000167723 This gene product belongs to a family of nonselective cation channels that function in a variety of processes, including temperature sensation and vasoregulation. The thermosensitive members of this family are expressed in subsets of sensory neurons that terminate in the skin, and are activated at distinct physiological temperatures. This channel is activated at temperatures between 22 and 40 degrees C. This gene lies in close proximity to another family member gene on chromosome 17, and the two encoded proteins are thought to associate with each other to form heteromeric channels. Multiple transcript variants encoding different isoforms have been found for this gene. NA
STRIP2 57464 striatin interacting protein 2 ENSG00000128578 NA NA
TSPAN18 90139 tetraspanin 18 ENSG00000157570 NA NA
ANK2 287 ankyrin 2, neuronal ENSG00000145362 This gene encodes a member of the ankyrin family of proteins that link the integral membrane proteins to the underlying spectrin-actin cytoskeleton. Ankyrins play key roles in activities such as cell motility, activation, proliferation, contact and the maintenance of specialized membrane domains. Most ankyrins are typically composed of three structural domains: an amino-terminal domain containing multiple ankyrin repeats; a central region with a highly conserved spectrin binding domain; and a carboxy-terminal regulatory domain which is the least conserved and subject to variation. The protein encoded by this gene is required for targeting and stability of Na/Ca exchanger 1 in cardiomyocytes. Mutations in this gene cause long QT syndrome 4 and cardiac arrhythmia syndrome. Multiple transcript variants encoding different isoforms have been described. NA
SGCA 6442 sarcoglycan alpha ENSG00000108823 This gene encodes a component of the dystrophin-glycoprotein complex (DGC), which is critical to the stability of muscle fiber membranes and to the linking of the actin cytoskeleton to the extracellular matrix. Its expression is thought to be restricted to striated muscle. Mutations in this gene result in type 2D autosomal recessive limb-girdle muscular dystrophy. Multiple transcript variants encoding different isoforms have been found for this gene. NA
PTGES3L 100885848 prostaglandin E synthase 3 (cytosolic)-like ENSG00000267060 NA NA
SAMD4A 23034 sterile alpha motif domain containing 4A ENSG00000020577 Sterile alpha motifs (SAMs) in proteins such as SAMD4A are part of an RNA-binding domain that functions as a posttranscriptional regulator by binding to an RNA sequence motif known as the Smaug recognition element, which was named after the Drosophila Smaug protein (Baez and Boccaccio, 2005 [PubMed 16221671]). NA
ADAMTS7 11173 ADAM metallopeptidase with thrombospondin type 1 motif 7 ENSG00000136378 The protein encoded by this gene is a member of the ADAMTS (a disintegrin and metalloproteinase with thrombospondin motifs) family. Members of this family share several distinct protein modules, including a propeptide region, a metalloproteinase domain, a disintegrin-like domain, and a thrombospondin type 1 (TS) motif. Individual members of this family differ in the number of C-terminal TS motifs, and some have unique C-terminal domains. The encoded preproprotein is proteolytically processed to generate the mature enzyme. This enzyme contains two C-terminal TS motifs and may regulate vascular smooth muscle cell (VSMC) migration. Mutations in this gene may be associated with susceptibility to coronary artery disease. NA
NA NA NA ENSG00000229164 NA TRUE
CLPS 1208 colipase ENSG00000137392 The protein encoded by this gene is a cofactor needed by pancreatic lipase for efficient dietary lipid hydrolysis. It binds to the C-terminal, non-catalytic domain of lipase, thereby stabilizing an active conformation and considerably increasing the overall hydrophobic binding site. The gene product allows lipase to anchor noncovalently to the surface of lipid micelles, counteracting the destabilizing influence of intestinal bile salts. This cofactor is only expressed in pancreatic acinar cells, suggesting regulation of expression by tissue-specific elements. Three transcript variants encoding different isoforms have been found for this gene. NA
NBPF13P ENSG00000227242 neuroblastoma breakpoint family member 13, pseudogene ENSG00000227242 NA NA
TMEM71 137835 transmembrane protein 71 ENSG00000165071 NA NA
COL23A1 91522 collagen type XXIII alpha 1 chain ENSG00000050767 COL23A1 is a member of the transmembrane collagens, a subfamily of the nonfibrillar collagens that contain a single pass hydrophobic transmembrane domain (Banyard et al., 2003 [PubMed 12644459]). NA
GAS7 8522 growth arrest specific 7 ENSG00000007237 Growth arrest-specific 7 is expressed primarily in terminally differentiated brain cells and predominantly in mature cerebellar Purkinje neurons. GAS7 plays a putative role in neuronal development. Several transcript variants encoding proteins which vary in the N-terminus have been described. NA
FLVCR2 55640 feline leukemia virus subgroup C cellular receptor family member 2 ENSG00000119686 This gene encodes a member of the major facilitator superfamily. The encoded transmembrane protein is a calcium transporter. Unlike the related protein feline leukemia virus subgroup C receptor 1, the protein encoded by this locus does not bind to feline leukemia virus subgroup C envelope protein. The encoded protein may play a role in development of brain vascular endothelial cells, as mutations at this locus have been associated with proliferative vasculopathy and hydranencephaly-hydrocephaly syndrome. Alternatively spliced transcript variants have been described. NA
FHL2 2274 four and a half LIM domains 2 ENSG00000115641 This gene encodes a member of the four-and-a-half-LIM-only protein family. Family members contain two highly conserved, tandemly arranged, zinc finger domains with four highly conserved cysteines binding a zinc atom in each zinc finger. This protein is thought to have a role in the assembly of extracellular membranes. Also, this gene is down-regulated during transformation of normal myoblasts to rhabdomyosarcoma cells and the encoded protein may function as a link between presenilin-2 and an intracellular signaling pathway. Multiple alternatively spliced variants encoding different isoforms have been identified. NA
PMP22 5376 peripheral myelin protein 22 ENSG00000109099 This gene encodes an integral membrane protein that is a major component of myelin in the peripheral nervous system. Studies suggest two alternately used promoters drive tissue-specific expression. Various mutations of this gene are causes of Charcot-Marie-Tooth disease Type IA, Dejerine-Sottas syndrome, and hereditary neuropathy with liability to pressure palsies. Alternative splicing results in multiple transcript variants. NA
DNAJA4 55466 DnaJ heat shock protein family (Hsp40) member A4 ENSG00000140403 NA NA
DPYSL4 10570 dihydropyrimidinase like 4 ENSG00000151640 NA NA
RP11-762H8.4 ENSG00000272418 NA ENSG00000272418 NA NA
RNF157 114804 ring finger protein 157 ENSG00000141576 NA NA
RP11-54F2.1 ENSG00000251196 NA ENSG00000251196 NA NA
CELF2 10659 CUGBP, Elav-like family member 2 ENSG00000048740 Members of the CELF/BRUNOL protein family contain two N-terminal RNA recognition motif (RRM) domains, one C-terminal RRM domain, and a divergent segment of 160-230 aa between the second and third RRM domains. Members of this protein family regulate pre-mRNA alternative splicing and may also be involved in mRNA editing, and translation. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
PLN 5350 phospholamban ENSG00000198523 The protein encoded by this gene is found as a pentamer and is a major substrate for the cAMP-dependent protein kinase in cardiac muscle. The encoded protein is an inhibitor of cardiac muscle sarcoplasmic reticulum Ca(2+)-ATPase in the unphosphorylated state, but inhibition is relieved upon phosphorylation of the protein. The subsequent activation of the Ca(2+) pump leads to enhanced muscle relaxation rates, thereby contributing to the inotropic response elicited in heart by beta-agonists. The encoded protein is a key regulator of cardiac diastolic function. Mutations in this gene are a cause of inherited human dilated cardiomyopathy with refractory congestive heart failure, and also familial hypertrophic cardiomyopathy. NA
NHSL1 57224 NHS like 1 ENSG00000135540 NA NA
EGR2 1959 early growth response 2 ENSG00000122877 The protein encoded by this gene is a transcription factor with three tandem C2H2-type zinc fingers. Defects in this gene are associated with Charcot-Marie-Tooth disease type 1D (CMT1D), Charcot-Marie-Tooth disease type 4E (CMT4E), and with Dejerine-Sottas syndrome (DSS). Multiple transcript variants encoding two different isoforms have been found for this gene. NA
HSPB8 26353 heat shock protein family B (small) member 8 ENSG00000152137 The protein encoded by this gene belongs to the superfamily of small heat-shock proteins containing a conservative alpha-crystallin domain at the C-terminal part of the molecule. The expression of this gene in induced by estrogen in estrogen receptor-positive breast cancer cells, and this protein also functions as a chaperone in association with Bag3, a stimulator of macroautophagy. Thus, this gene appears to be involved in regulation of cell proliferation, apoptosis, and carcinogenesis, and mutations in this gene have been associated with different neuromuscular diseases, including Charcot-Marie-Tooth disease. NA
SHCBP1 79801 SHC binding and spindle associated 1 ENSG00000171241 NA NA
KBTBD8 84541 kelch repeat and BTB domain containing 8 ENSG00000163376 NA NA
MTND2P28 ENSG00000225630 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28 ENSG00000225630 NA NA
GPR183 1880 G protein-coupled receptor 183 ENSG00000169508 This gene was identified by the up-regulation of its expression upon Epstein-Barr virus infection of primary B lymphocytes. This gene is predicted to encode a G protein-coupled receptor that is most closely related to the thrombin receptor. Expression of this gene was detected in B-lymphocyte cell lines and lymphoid tissues but not in T-lymphocyte cell lines or peripheral blood T lymphocytes. The function of this gene is unknown. NA
AC019349.5 ENSG00000229732 NA ENSG00000229732 NA NA
CELA2A 63036 chymotrypsin like elastase family member 2A ENSG00000142615 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Like most of the human elastases, elastase 2A is secreted from the pancreas as a zymogen. In other species, elastase 2A has been shown to preferentially cleave proteins after leucine, methionine, and phenylalanine residues. NA
MFAP5 8076 microfibrillar associated protein 5 ENSG00000197614 This gene encodes a 25-kD microfibril-associated glycoprotein which is a component of microfibrils of the extracellular matrix. The encoded protein promotes attachment of cells to microfibrils via alpha-V-beta-3 integrin. Deficiency of this gene in mice results in neutropenia. Alternate splicing results in multiple transcript variants encoding different isoforms. NA
HYAL1 3373 hyaluronoglucosaminidase 1 ENSG00000114378 This gene encodes a lysosomal hyaluronidase. Hyaluronidases intracellularly degrade hyaluronan, one of the major glycosaminoglycans of the extracellular matrix. Hyaluronan is thought to be involved in cell proliferation, migration and differentiation. This enzyme is active at an acidic pH and is the major hyaluronidase in plasma. Mutations in this gene are associated with mucopolysaccharidosis type IX, or hyaluronidase deficiency. The gene is one of several related genes in a region of chromosome 3p21.3 associated with tumor suppression. Multiple transcript variants encoding different isoforms have been found for this gene. NA
RCSD1 92241 RCSD domain containing 1 ENSG00000198771 NA NA
MTND1P23 ENSG00000225972 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 1 pseudogene 23 ENSG00000225972 NA NA
HK3 3101 hexokinase 3 ENSG00000160883 Hexokinases phosphorylate glucose to produce glucose-6-phosphate, the first step in most glucose metabolism pathways. This gene encodes hexokinase 3. Similar to hexokinases 1 and 2, this allosteric enzyme is inhibited by its product glucose-6-phosphate. NA
LIPF 8513 lipase F, gastric type ENSG00000182333 This gene encodes gastric lipase, an enzyme involved in the digestion of dietary triglycerides in the gastrointestinal tract, and responsible for 30% of fat digestion processes occurring in human. It is secreted by gastric chief cells in the fundic mucosa of the stomach, and it hydrolyzes the ester bonds of triglycerides under acidic pH conditions. The gene is a member of a conserved gene family of lipases that play distinct roles in neutral lipid metabolism. Several transcript variants encoding different isoforms have been found for this gene. NA
GPR137B 7107 G protein-coupled receptor 137B ENSG00000077585 NA NA
OPLAH 26873 5-oxoprolinase (ATP-hydrolysing) ENSG00000178814 The protein encoded by this gene acts as a homodimer, using ATP hydrolysis to catalyze the conversion of 5-oxo-L-proline to L-glutamate. Defects in this gene are a cause of 5-oxoprolinase deficiency (OPLAHD). NA
PCOLCE2 26577 procollagen C-endopeptidase enhancer 2 ENSG00000163710 NA NA
ADAM23 8745 ADAM metallopeptidase domain 23 ENSG00000114948 This gene encodes a member of the ADAM (a disintegrin and metalloprotease domain) family. Members of this family are membrane-anchored proteins structurally related to snake venom disintegrins and have been implicated in a variety of biological processes involving cell-cell and cell-matrix interactions, including fertilization, muscle development, and neurogenesis. It is reported that inactivation of this gene is associated with tumorigenesis in human cancers. NA
KCNIP2 30819 potassium voltage-gated channel interacting protein 2 ENSG00000120049 This gene encodes a member of the family of voltage-gated potassium (Kv) channel-interacting proteins (KCNIPs), which belongs to the recoverin branch of the EF-hand superfamily. Members of the KCNIP family are small calcium binding proteins. They all have EF-hand-like domains, and differ from each other in the N-terminus. They are integral subunit components of native Kv4 channel complexes. They may regulate A-type currents, and hence neuronal excitability, in response to changes in intracellular calcium. Multiple alternatively spliced transcript variants encoding distinct isoforms have been identified from this gene. NA
SCOC-AS1 100129858 SCOC antisense RNA 1 ENSG00000196951 NA NA
TNXB 7148 tenascin XB ENSG00000168477 This gene encodes a member of the tenascin family of extracellular matrix glycoproteins. The tenascins have anti-adhesive effects, as opposed to fibronectin which is adhesive. This protein is thought to function in matrix maturation during wound healing, and its deficiency has been associated with the connective tissue disorder Ehlers-Danlos syndrome. This gene localizes to the major histocompatibility complex (MHC) class III region on chromosome 6. It is one of four genes in this cluster which have been duplicated. The duplicated copy of this gene is incomplete and is a pseudogene which is transcribed but does not encode a protein. The structure of this gene is unusual in that it overlaps the CREBL1 and CYP21A2 genes at its 5’ and 3’ ends, respectively. Multiple transcript variants encoding different isoforms have been found for this gene. NA
KRT2 3849 keratin 2 ENSG00000172867 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. NA
THBS4 7060 thrombospondin 4 ENSG00000113296 The protein encoded by this gene belongs to the thrombospondin protein family. Thrombospondin family members are adhesive glycoproteins that mediate cell-to-cell and cell-to-matrix interactions. This protein forms a pentamer and can bind to heparin and calcium. It is involved in local signaling in the developing and adult nervous system, and it contributes to spinal sensitization and neuropathic pain states. This gene is activated during the stromal response to invasive breast cancer. It may also play a role in inflammatory responses in Alzheimer’s disease. Alternative splicing results in multiple transcript variants. NA
JAM2 58494 junctional adhesion molecule 2 ENSG00000154721 This gene belongs to the immunoglobulin superfamily, and the junctional adhesion molecule (JAM) family. The protein encoded by this gene is a type I membrane protein that is localized at the tight junctions of both epithelial and endothelial cells. It acts as an adhesive ligand for interacting with a variety of immune cell types, and may play a role in lymphocyte homing to secondary lymphoid organs. Alternatively spliced transcript variants have been found for this gene. NA
APOBEC2 10930 apolipoprotein B mRNA editing enzyme catalytic subunit 2 ENSG00000124701 NA NA
CRNN 49860 cornulin ENSG00000143536 This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation. NA
PLPP7 84814 phospholipid phosphatase 7 (inactive) ENSG00000160539 NA NA
LOC101928718 101928718 uncharacterized LOC101928718 ENSG00000197852 NA NA
FAM212B 55924 family with sequence similarity 212 member B ENSG00000197852 NA NA
COLGALT2 23127 collagen beta(1-O)galactosyltransferase 2 ENSG00000198756 NA NA
ANKRD9 122416 ankyrin repeat domain 9 ENSG00000156381 NA NA
NA NA NA ENSG00000204794 NA TRUE
PKIA 5569 protein kinase (cAMP-dependent, catalytic) inhibitor alpha ENSG00000171033 The protein encoded by this gene is a member of the cAMP-dependent protein kinase (PKA) inhibitor family. This protein was demonstrated to interact with and inhibit the activities of both C alpha and C beta catalytic subunits of the PKA. Alternatively spliced transcript variants encoding the same protein have been reported. NA
CPNE5 57699 copine 5 ENSG00000124772 Calcium-dependent membrane-binding proteins may regulate molecular events at the interface of the cell membrane and cytoplasm. This gene is one of several genes that encode a calcium-dependent protein containing two N-terminal type II C2 domains and an integrin A domain-like sequence in the C-terminus. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene. More variants may exist, but their full-length natures could not be determined. NA
TMEM176B 28959 transmembrane protein 176B ENSG00000106565 NA NA
SPX 80763 spexin hormone ENSG00000134548 The protein encoded by this gene is a hormone involved in modulation of cardiovascular and renal function. It has also been shown in rats to cause weight loss. Several transcript variants have been found for this gene. NA
NKD2 85409 naked cuticle homolog 2 ENSG00000145506 This gene encodes a member of a family of proteins that function as negative regulators of Wnt receptor signaling through interaction with Dishevelled family members. The encoded protein participates in the delivery of transforming growth factor alpha-containing vesicles to the cell membrane. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
CMYA5 202333 cardiomyopathy associated 5 ENSG00000164309 NA NA
P2RX7 5027 purinergic receptor P2X 7 ENSG00000089041 The product of this gene belongs to the family of purinoceptors for ATP. This receptor functions as a ligand-gated ion channel and is responsible for ATP-dependent lysis of macrophages through the formation of membrane pores permeable to large molecules. Activation of this nuclear receptor by ATP in the cytoplasm may be a mechanism by which cellular activity can be coupled to changes in gene expression. Multiple alternatively spliced variants have been identified, most of which fit nonsense-mediated decay (NMD) criteria. NA
KIF1A 547 kinesin family member 1A ENSG00000130294 The protein encoded by this gene is a member of the kinesin family and functions as an anterograde motor protein that transports membranous organelles along axonal microtubules. Mutations at this locus have been associated with spastic paraplegia-30 and hereditary sensory neuropathy IIC. Alternatively spliced transcript variants encoding distinct isoforms have been described. NA
GATB 5188 glutamyl-tRNA(Gln) amidotransferase, subunit B ENSG00000059691 NA NA
PYGM 5837 phosphorylase, glycogen, muscle ENSG00000068976 This gene encodes a muscle enzyme involved in glycogenolysis. Highly similar enzymes encoded by different genes are found in liver and brain. Mutations in this gene are associated with McArdle disease (myophosphorylase deficiency), a glycogen storage disease of muscle. Alternative splicing results in multiple transcript variants. NA
RP1-253P7.4 ENSG00000197815 NA ENSG00000197815 NA NA
TNFRSF12A 51330 tumor necrosis factor receptor superfamily member 12A ENSG00000006327 NA NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",7,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 8 Annotations

out <- mygene::queryMany(gene_list[8,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id summary query name notfound
C10orf10 11067 The expression of this gene is induced by fasting as well as by progesterone. The protein encoded by this gene contains a t-synaptosome-associated protein receptor (SNARE) coiled-coil homology domain and a peroxisomal targeting signal. Production of the encoded protein leads to phosphorylation and activation of the transcription factor ELK1. ENSG00000165507 chromosome 10 open reading frame 10 NA
SNHG25 ENSG00000266402 NA ENSG00000266402 small nucleolar RNA host gene 25 NA
CXCL2 2920 This antimicrobial gene is part of a chemokine superfamily that encodes secreted proteins involved in immunoregulatory and inflammatory processes. The superfamily is divided into four subfamilies based on the arrangement of the N-terminal cysteine residues of the mature peptide. This chemokine, a member of the CXC subfamily, is expressed at sites of inflammation and may suppress hematopoietic progenitor cell proliferation. ENSG00000081041 C-X-C motif chemokine ligand 2 NA
NUPR1 26471 NA ENSG00000176046 nuclear protein 1, transcriptional regulator NA
HSPD1P1 ENSG00000213430 NA ENSG00000213430 heat shock protein family D (Hsp60) member 1 pseudogene 1 NA
ERRFI1 54206 ERRFI1 is a cytoplasmic protein whose expression is upregulated with cell growth (Wick et al., 1995 [PubMed 7641805]). It shares significant homology with the protein product of rat gene-33, which is induced during cell stress and mediates cell signaling (Makkinje et al., 2000 [PubMed 10749885]; Fiorentino et al., 2000 [PubMed 11003669]). ENSG00000116285 ERBB receptor feedback inhibitor 1 NA
MT1M 4499 This gene encodes a member of the metallothionein superfamily, type 1 family. Metallothioneins have a high content of cysteine residues that bind various heavy metals. These genes are transcriptionally regulated by both heavy metals and glucocorticoids. ENSG00000205364 metallothionein 1M NA
N4BP2L1 90634 NA ENSG00000139597 NEDD4 binding protein 2-like 1 NA
RP11-442H21.2 ENSG00000269926 NA ENSG00000269926 NA NA
ANGPTL4 51129 This gene encodes a glycosylated, secreted protein containing a C-terminal fibrinogen domain. The encoded protein is induced by peroxisome proliferation activators and functions as a serum hormone that regulates glucose homeostasis, lipid metabolism, and insulin sensitivity. This protein can also act as an apoptosis survival factor for vascular endothelial cells and can prevent metastasis by inhibiting vascular growth and tumor cell invasion. The C-terminal domain may be proteolytically-cleaved from the full-length secreted protein. Decreased expression of this gene has been associated with type 2 diabetes. Alternative splicing results in multiple transcript variants. This gene was previously referred to as ANGPTL2 but has been renamed ANGPTL4. ENSG00000167772 angiopoietin like 4 NA
CTH 1491 This gene encodes a cytoplasmic enzyme in the trans-sulfuration pathway that converts cystathione derived from methionine into cysteine. Glutathione synthesis in the liver is dependent upon the availability of cysteine. Mutations in this gene cause cystathioninuria. Alternative splicing of this gene results in three transcript variants encoding different isoforms. ENSG00000116761 cystathionine gamma-lyase NA
INSIG1 3638 Oxysterols regulate cholesterol homeostasis through the liver X receptor (LXR)- and sterol regulatory element-binding protein (SREBP)-mediated signaling pathways. This gene is an insulin-induced gene. It encodes an endoplasmic reticulum (ER) membrane protein that plays a critical role in regulating cholesterol concentrations in cells. This protein binds to the sterol-sensing domains of SREBP cleavage-activating protein (SCAP) and HMG CoA reductase, and is essential for the sterol-mediated trafficking of the two proteins. Alternatively spliced transcript variants encoding distinct isoforms have been observed. ENSG00000186480 insulin induced gene 1 NA
DDIT4 54541 NA ENSG00000168209 DNA damage inducible transcript 4 NA
BHLHE40-AS1 100507582 NA ENSG00000235831 BHLHE40 antisense RNA 1 NA
L3MBTL4 91133 NA ENSG00000154655 l(3)mbt-like 4 (Drosophila) NA
MPP6 51678 Members of the peripheral membrane-associated guanylate kinase (MAGUK) family function in tumor suppression and receptor clustering by forming multiprotein complexes containing distinct sets of transmembrane, cytoskeletal, and cytoplasmic signaling proteins. All MAGUKs contain a PDZ-SH3-GUK core and are divided into 4 subfamilies, DLG-like (see DLG1; MIM 601014), ZO1-like (see TJP1; MIM 601009), p55-like (see MPP1; MIM 305360), and LIN2-like (see CASK; MIM 300172), based on their size and the presence of additional domains. MPP6 is a member of the p55-like MAGUK subfamily (Tseng et al., 2001 [PubMed 11311936]). ENSG00000105926 membrane palmitoylated protein 6 NA
PHGDH 26227 This gene encodes the enzyme which is involved in the early steps of L-serine synthesis in animal cells. L-serine is required for D-serine and other amino acid synthesis. The enzyme requires NAD/NADH as a cofactor and forms homotetramers for activity. Mutations in this gene have been found in a family with congenital microcephaly, psychomotor retardation and other symptoms. Multiple alternatively spliced transcript variants have been found, however the full-length nature of most are not known. ENSG00000092621 phosphoglycerate dehydrogenase NA
SNORA32 692063 NA ENSG00000206799 small nucleolar RNA, H/ACA box 32 NA
RGS1 5996 This gene encodes a member of the regulator of G-protein signalling family. This protein is located on the cytosolic side of the plasma membrane and contains a conserved, 120 amino acid motif called the RGS domain. The protein attenuates the signalling activity of G-proteins by binding to activated, GTP-bound G alpha subunits and acting as a GTPase activating protein (GAP), increasing the rate of conversion of the GTP to GDP. This hydrolysis allows the G alpha subunits to bind G beta/gamma subunit heterodimers, forming inactive G-protein heterotrimers, thereby terminating the signal. ENSG00000090104 regulator of G-protein signaling 1 NA
AC010761.9 ENSG00000265474 NA ENSG00000265474 NA NA
SPOCK1 6695 This gene encodes the protein core of a seminal plasma proteoglycan containing chondroitin- and heparan-sulfate chains. The protein’s function is unknown, although similarity to thyropin-type cysteine protease-inhibitors suggests its function may be related to protease inhibition. ENSG00000152377 sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 1 NA
PAWR 5074 The tumor suppressor WT1 represses and activates transcription. The protein encoded by this gene is a WT1-interacting protein that itself functions as a transcriptional repressor. It contains a putative leucine zipper domain which interacts with the zinc finger DNA binding domain of WT1. This protein is specifically upregulated during apoptosis of prostate cells. ENSG00000177425 pro-apoptotic WT1 regulator NA
RPL35P1 ENSG00000237991 NA ENSG00000237991 ribosomal protein L35 pseudogene 1 NA
CX3CL1 6376 NA ENSG00000006210 C-X3-C motif chemokine ligand 1 NA
HSPD1 3329 This gene encodes a member of the chaperonin family. The encoded mitochondrial protein may function as a signaling molecule in the innate immune system. This protein is essential for the folding and assembly of newly imported proteins in the mitochondria. This gene is adjacent to a related family member and the region between the 2 genes functions as a bidirectional promoter. Several pseudogenes have been associated with this gene. Two transcript variants encoding the same protein have been identified for this gene. Mutations associated with this gene cause autosomal recessive spastic paraplegia 13. ENSG00000144381 heat shock protein family D (Hsp60) member 1 NA
AC010761.10 ENSG00000265840 NA ENSG00000265840 NA NA
STX17-AS1 441461 NA ENSG00000255145 STX17 antisense RNA 1 NA
RP5-1112D6.7 ENSG00000271789 NA ENSG00000271789 NA NA
NSUN6 221078 NA ENSG00000241058 NOP2/Sun RNA methyltransferase family member 6 NA
TSPAN12 23554 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. ENSG00000106025 tetraspanin 12 NA
AC079922.2 ENSG00000231747 NA ENSG00000231747 NA NA
RPL9P32 ENSG00000242100 NA ENSG00000242100 ribosomal protein L9 pseudogene 32 NA
UGDH 7358 The protein encoded by this gene converts UDP-glucose to UDP-glucuronate and thereby participates in the biosynthesis of glycosaminoglycans such as hyaluronan, chondroitin sulfate, and heparan sulfate. These glycosylated compounds are common components of the extracellular matrix and likely play roles in signal transduction, cell migration, and cancer growth and metastasis. The expression of this gene is up-regulated by transforming growth factor beta and down-regulated by hypoxia. Alternative splicing results in multiple transcript variants. ENSG00000109814 UDP-glucose 6-dehydrogenase NA
RP11-42O15.3 ENSG00000271992 NA ENSG00000271992 NA NA
TRAF4 9618 This gene encodes a member of the TNF receptor associated factor (TRAF) family. TRAF proteins are associated with, and mediate signal transduction from members of the TNF receptor superfamily. The encoded protein has been shown to interact with neurotrophin receptor, p75 (NTR/NTSR1), and negatively regulate NTR induced cell death and NF-kappa B activation. This protein has been found to bind to p47phox, a cytosolic regulatory factor included in a multi-protein complex known as NAD(P)H oxidase. This protein thus, is thought to be involved in the oxidative activation of MAPK8/JNK. Alternatively spliced transcript variants have been observed but the full-length nature of only one has been determined. ENSG00000076604 TNF receptor associated factor 4 NA
HSPE1 3336 This gene encodes a major heat shock protein which functions as a chaperonin. Its structure consists of a heptameric ring which binds to another heat shock protein in order to form a symmetric, functional heterodimer which enhances protein folding in an ATP-dependent manner. This gene and its co-chaperonin, HSPD1, are arranged in a head-to-head orientation on chromosome 2. Naturally occurring read-through transcription occurs between this locus and the neighboring locus MOBKL3. ENSG00000115541 heat shock protein family E (Hsp10) member 1 NA
NADK2 133686 This gene encodes a mitochondrial kinase that catalyzes the phosphorylation of NAD to yield NADP. Mutations in this gene result in 2,4-dienoyl-CoA reductase deficiency. Alternative splicing results in multiple transcript variants. ENSG00000152620 NAD kinase 2, mitochondrial NA
NA NA NA ENSG00000261280 NA TRUE
PIGHP1 ENSG00000259657 NA ENSG00000259657 phosphatidylinositol glycan anchor biosynthesis class H pseudogene 1 NA
LINC01473 101927217 NA ENSG00000237877 long intergenic non-protein coding RNA 1473 NA
IL6 3569 This gene encodes a cytokine that functions in inflammation and the maturation of B cells. In addition, the encoded protein has been shown to be an endogenous pyrogen capable of inducing fever in people with autoimmune diseases or infections. The protein is primarily produced at sites of acute and chronic inflammation, where it is secreted into the serum and induces a transcriptional inflammatory response through interleukin 6 receptor, alpha. The functioning of this gene is implicated in a wide variety of inflammation-associated disease states, including suspectibility to diabetes mellitus and systemic juvenile rheumatoid arthritis. Alternative splicing results in multiple transcript variants. ENSG00000136244 interleukin 6 NA
RP11-54O7.14 ENSG00000242590 NA ENSG00000242590 NA NA
DFNB59 494513 The protein encoded by this gene is a member of the gasdermin family, a family which is found only in vertebrates. The encoded protein is required for the proper function of auditory pathway neurons. Defects in this gene are a cause of non-syndromic sensorineural deafness autosomal recessive type 59 (DFNB59). ENSG00000204311 deafness, autosomal recessive 59 NA
AC009404.2 ENSG00000236255 NA ENSG00000236255 NA NA
AOX1 316 Aldehyde oxidase produces hydrogen peroxide and, under certain conditions, can catalyze the formation of superoxide. Aldehyde oxidase is a candidate gene for amyotrophic lateral sclerosis. ENSG00000138356 aldehyde oxidase 1 NA
RP11-83J16.1 ENSG00000231409 NA ENSG00000231409 NA NA
RPL35P5 ENSG00000225573 NA ENSG00000225573 ribosomal protein L35 pseudogene 5 NA
HESX1 8820 This gene encodes a conserved homeobox protein that is a transcriptional repressor in the developing forebrain and pituitary gland. Mutations in this gene are associated with septooptic dysplasia, HESX1-related growth hormone deficiency, and combined pituitary hormone deficiency. ENSG00000163666 HESX homeobox 1 NA
RARRES1 5918 This gene was identified as a retinoid acid (RA) receptor-responsive gene. It encodes a type 1 membrane protein. The expression of this gene is upregulated by tazarotene as well as by retinoic acid receptors. The expression of this gene is found to be downregulated in prostate cancer, which is caused by the methylation of its promoter and CpG island. Alternatively spliced transcript variant encoding distinct isoforms have been observed. ENSG00000118849 retinoic acid receptor responder 1 NA
C8orf4 56892 This gene encodes a small, monomeric, predominantly unstructured protein that functions as a positive regulator of the Wnt/beta-catenin signaling pathway. This protein interacts with a repressor of beta-catenin mediated transcription at nuclear speckles. It is thought to competitively block interactions of the repressor with beta-catenin, resulting in up-regulation of beta-catenin target genes. The encoded protein may also play a role in the NF-kappaB and ERK1/2 signaling pathways. Expression of this gene may play a role in the proliferation of several types of cancer including thyroid cancer, breast cancer and hematological malignancies. ENSG00000176907 chromosome 8 open reading frame 4 NA
NOTCH4 4855 This gene encodes a member of the NOTCH family of proteins. Members of this Type I transmembrane protein family share structural characteristics including an extracellular domain consisting of multiple epidermal growth factor-like (EGF) repeats, and an intracellular domain consisting of multiple different domain types. Notch signaling is an evolutionarily conserved intercellular signaling pathway that regulates interactions between physically adjacent cells through binding of Notch family receptors to their cognate ligands. The encoded preproprotein is proteolytically processed in the trans-Golgi network to generate two polypeptide chains that heterodimerize to form the mature cell-surface receptor. This receptor may play a role in vascular, renal and hepatic development. Mutations in this gene may be associated with schizophrenia. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. ENSG00000204301 notch 4 NA
EGR2 1959 The protein encoded by this gene is a transcription factor with three tandem C2H2-type zinc fingers. Defects in this gene are associated with Charcot-Marie-Tooth disease type 1D (CMT1D), Charcot-Marie-Tooth disease type 4E (CMT4E), and with Dejerine-Sottas syndrome (DSS). Multiple transcript variants encoding two different isoforms have been found for this gene. ENSG00000122877 early growth response 2 NA
NA NA NA ENSG00000269165 NA TRUE
CKS1B 1163 CKS1B protein binds to the catalytic subunit of the cyclin dependent kinases and is essential for their biological function. The CKS1B mRNA is found to be expressed in different patterns through the cell cycle in HeLa cells, which reflects a specialized role for the encoded protein. At least two transcript variants have been identified for this gene, and it appears that only one of them encodes a protein. ENSG00000173207 CDC28 protein kinase regulatory subunit 1B NA
DHODH 1723 The protein encoded by this gene catalyzes the fourth enzymatic step, the ubiquinone-mediated oxidation of dihydroorotate to orotate, in de novo pyrimidine biosynthesis. This protein is a mitochondrial protein located on the outer surface of the inner mitochondrial membrane. ENSG00000102967 dihydroorotate dehydrogenase (quinone) NA
EIF4EBP1 1978 This gene encodes one member of a family of translation repressor proteins. The protein directly interacts with eukaryotic translation initiation factor 4E (eIF4E), which is a limiting component of the multisubunit complex that recruits 40S ribosomal subunits to the 5’ end of mRNAs. Interaction of this protein with eIF4E inhibits complex assembly and represses translation. This protein is phosphorylated in response to various signals including UV irradiation and insulin signaling, resulting in its dissociation from eIF4E and activation of mRNA translation. ENSG00000187840 eukaryotic translation initiation factor 4E binding protein 1 NA
GCH1 2643 This gene encodes a member of the GTP cyclohydrolase family. The encoded protein is the first and rate-limiting enzyme in tetrahydrobiopterin (BH4) biosynthesis, catalyzing the conversion of GTP into 7,8-dihydroneopterin triphosphate. BH4 is an essential cofactor required by aromatic amino acid hydroxylases as well as nitric oxide synthases. Mutations in this gene are associated with malignant hyperphenylalaninemia and dopa-responsive dystonia. Several alternatively spliced transcript variants encoding different isoforms have been described; however, not all variants give rise to a functional enzyme. ENSG00000131979 GTP cyclohydrolase 1 NA
SMO 6608 The protein encoded by this gene is a G protein-coupled receptor that interacts with the patched protein, a receptor for hedgehog proteins. The encoded protein tranduces signals to other proteins after activation by a hedgehog protein/patched protein complex. ENSG00000128602 smoothened, frizzled class receptor NA
MMAB 326625 This gene encodes a protein that catalyzes the final step in the conversion of vitamin B(12) into adenosylcobalamin (AdoCbl), a vitamin B12-containing coenzyme for methylmalonyl-CoA mutase. Mutations in the gene are the cause of vitamin B12-dependent methylmalonic aciduria linked to the cblB complementation group. Alternatively spliced transcript variants have been found. ENSG00000139428 methylmalonic aciduria (cobalamin deficiency) cblB type NA
GJA4 2701 This gene encodes a member of the connexin gene family. The encoded protein is a component of gap junctions, which are composed of arrays of intercellular channels that provide a route for the diffusion of low molecular weight materials from cell to cell. Mutations in this gene have been associated with atherosclerosis and a higher risk of myocardial infarction. ENSG00000187513 gap junction protein alpha 4 NA
RPS7P3 ENSG00000231940 NA ENSG00000231940 ribosomal protein S7 pseudogene 3 NA
RP11-592N21.1 ENSG00000212664 NA ENSG00000212664 NA NA
PPP1R14B 26472 NA ENSG00000173457 protein phosphatase 1 regulatory inhibitor subunit 14B NA
HSD17B7 51478 HSD17B7 encodes an enzyme that functions both as a 17-beta-hydroxysteroid dehydrogenase (EC 1.1.1.62) in the biosynthesis of sex steroids and as a 3-ketosteroid reductase (EC 1.1.1.270) in the biosynthesis of cholesterol (Marijanovic et al., 2003 [PubMed 12829805]). ENSG00000132196 hydroxysteroid 17-beta dehydrogenase 7 NA
AFMID 125061 NA ENSG00000183077 arylformamidase NA
SLC43A1 8501 SLC43A1 belongs to the system L family of plasma membrane carrier proteins that transports large neutral amino acids (Babu et al., 2003 [PubMed 12930836]). ENSG00000149150 solute carrier family 43 member 1 NA
ZFP36L1 677 This gene is a member of the TIS11 family of early response genes, which are induced by various agonists such as the phorbol ester TPA and the polypeptide mitogen EGF. This gene is well conserved across species and has a promoter that contains motifs seen in other early-response genes. The encoded protein contains a distinguishing putative zinc finger domain with a repeating cys-his motif. This putative nuclear transcription factor most likely functions in regulating the response to growth factors. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000185650 ZFP36 ring finger protein-like 1 NA
IFITM4P ENSG00000235821 NA ENSG00000235821 interferon induced transmembrane protein 4 pseudogene NA
CTC-301O7.4 ENSG00000197813 NA ENSG00000197813 NA NA
CRYZ 1429 Crystallins are separated into two classes: taxon-specific, or enzyme, and ubiquitous. The latter class constitutes the major proteins of vertebrate eye lens and maintains the transparency and refractive index of the lens. The former class is also called phylogenetically-restricted crystallins. This gene encodes a taxon-specific crystallin protein which has NADPH-dependent quinone reductase activity distinct from other known quinone reductases. It lacks alcohol dehydrogenase activity although by similarity it is considered a member of the zinc-containing alcohol dehydrogenase family. Unlike other mammalian species, in humans, lens expression is low. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. One pseudogene is known to exist. ENSG00000116791 crystallin zeta NA
RPL17P50 ENSG00000213700 NA ENSG00000213700 ribosomal protein L17 pseudogene 50 NA
RPS2P48 ENSG00000233380 NA ENSG00000233380 ribosomal protein S2 pseudogene 48 NA
NA NA NA ENSG00000273097 NA TRUE
TOP1MT 116447 This gene encodes a mitochondrial DNA topoisomerase that plays a role in the modification of DNA topology. The encoded protein is a type IB topoisomerase and catalyzes the transient breaking and rejoining of DNA to relieve tension and DNA supercoiling generated in the mitochondrial genome during replication and transcription. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000184428 topoisomerase (DNA) I, mitochondrial NA
NA NA NA ENSG00000269999 NA TRUE
ELL2P1 ENSG00000227295 NA ENSG00000227295 elongation factor, RNA polymerase II, 2 pseudogene 1 NA
ASS1P2 ENSG00000223922 NA ENSG00000223922 argininosuccinate synthetase 1 pseudogene 2 NA
OLMALINC ENSG00000235823 NA ENSG00000235823 oligodendrocyte maturation-associated long intergenic non-coding RNA NA
RP11-727A23.4 ENSG00000254676 NA ENSG00000254676 NA NA
PA2G4P4 ENSG00000230457 NA ENSG00000230457 proliferation-associated 2G4 pseudogene 4 NA
RCL1 10171 NA ENSG00000120158 RNA terminal phosphate cyclase like 1 NA
HSPA4L 22824 The protein encoded by this gene is heat shock inducible and may act as a chaperone. The encoded protein can protect the heat-shocked cell against the harmful effects of aggregated proteins. This gene is highly expressed in leukemia cells and may be a good target for therapeutic intervention. Several transcripts encoding different isoforms have been found for this gene. ENSG00000164070 heat shock protein family A (Hsp70) member 4 like NA
GRHL1 29841 This gene encodes a member of the grainyhead family of transcription factors. The encoded protein can exist as a homodimer or can form heterodimers with sister-of-mammalian grainyhead or brother-of-mammalian grainyhead. This protein functions as a transcription factor during development. ENSG00000134317 grainyhead like transcription factor 1 NA
RP11-799B12.2 ENSG00000264924 NA ENSG00000264924 NA NA
CKS1BP3 ENSG00000268942 NA ENSG00000268942 CDC28 protein kinase regulatory subunit 1B pseudogene 3 NA
HSPE1P2 ENSG00000258645 NA ENSG00000258645 heat shock protein family E (Hsp10) member 1 pseudogene 2 NA
SIX5 147912 The protein encoded by this gene is a homeodomain-containing transcription factor that appears to function in the regulation of organogenesis. This gene is located downstream of the dystrophia myotonica-protein kinase gene. Mutations in this gene are a cause of branchiootorenal syndrome type 2. ENSG00000177045 SIX homeobox 5 NA
GAS2 2620 The protein encoded by this gene is a caspase-3 substrate that plays a role in regulating microfilament and cell shape changes during apoptosis. It can also modulate cell susceptibility to p53-dependent apoptosis by inhibiting calpain activity. Multiple alternatively spliced variants, encoding the same protein, have been identified. ENSG00000148935 growth arrest specific 2 NA
RP11-513G19.1 ENSG00000255968 NA ENSG00000255968 NA NA
CTD-2031P19.4 ENSG00000264281 NA ENSG00000264281 NA NA
RP11-70L8.4 ENSG00000265194 NA ENSG00000265194 NA NA
MGMT 4255 Alkylating agents are potent carcinogens that can result in cell death, mutation and cancer. The protein encoded by this gene is a DNA repair protein that is involved in cellular defense against mutagenesis and toxicity from alkylating agents. The protein catalyzes transfer of methyl groups from O(6)-alkylguanine and other methylated moieties of the DNA to its own molecule, which repairs the toxic lesions. Methylation of the genes promoter has been associated with several cancer types, including colorectal cancer, lung cancer, lymphoma and glioblastoma. ENSG00000170430 O-6-methylguanine-DNA methyltransferase NA
GLS2 27165 The protein encoded by this gene is a mitochondrial phosphate-activated glutaminase that catalyzes the hydrolysis of glutamine to stoichiometric amounts of glutamate and ammonia. Originally thought to be liver-specific, this protein has been found in other tissues as well. Alternative splicing results in multiple transcript variants that encode different isoforms. ENSG00000135423 glutaminase 2 NA
TTC39C 125488 NA ENSG00000168234 tetratricopeptide repeat domain 39C NA
PDE8A 5151 The protein encoded by this gene belongs to the cyclic nucleotide phosphodiesterase (PDE) family, and PDE8 subfamily. This PDE hydrolyzes the second messenger, cAMP, which is a regulator and mediator of a number of cellular responses to extracellular signals. Thus, by regulating the cellular concentration of cAMP, this protein plays a key role in many important physiological processes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000073417 phosphodiesterase 8A NA
NFIB 4781 NA ENSG00000147862 nuclear factor I B NA
MICAL2 9645 NA ENSG00000133816 microtubule associated monooxygenase, calponin and LIM domain containing 2 NA
STK3 6788 This gene encodes a serine/threonine protein kinase activated by proapoptotic molecules indicating the encoded protein functions as a growth suppressor. Cleavage of the protein product by caspase removes the inhibitory C-terminal portion. The N-terminal portion is transported to the nucleus where it homodimerizes to form the active kinase which promotes the condensation of chromatin during apoptosis. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000104375 serine/threonine kinase 3 NA
GAMT 2593 The protein encoded by this gene is a methyltransferase that converts guanidoacetate to creatine, using S-adenosylmethionine as the methyl donor. Defects in this gene have been implicated in neurologic syndromes and muscular hypotonia, probably due to creatine deficiency and accumulation of guanidinoacetate in the brain of affected individuals. Two transcript variants encoding different isoforms have been described for this gene. Pseudogenes of this gene are found on chromosomes 2 and 13. ENSG00000130005 guanidinoacetate N-methyltransferase NA
PTP4A1 7803 This gene encodes a member of a small class of prenylated protein tyrosine phosphatases (PTPs), which contain a PTP domain and a characteristic C-terminal prenylation motif. The encoded protein is a cell signaling molecule that plays regulatory roles in a variety of cellular processes, including cell proliferation and migration. The protein may also be involved in cancer development and metastasis. This tyrosine phosphatase is a nuclear protein, but may associate with plasma membrane by means of its prenylation motif. Pseudogenes related to this gene are located on chromosomes 1, 2, 5, 7, 11 and X. ENSG00000112245 protein tyrosine phosphatase type IVA, member 1 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",8,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 9 Annotations

out <- mygene::queryMany(gene_list[9,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary X_id symbol name query notfound
This gene is a main control point for the regulation of gluconeogenesis. The cytosolic enzyme encoded by this gene, along with GTP, catalyzes the formation of phosphoenolpyruvate from oxaloacetate, with the release of carbon dioxide and GDP. The expression of this gene can be regulated by insulin, glucocorticoids, glucagon, cAMP, and diet. Defects in this gene are a cause of cytosolic phosphoenolpyruvate carboxykinase deficiency. A mitochondrial isozyme of the encoded protein also has been characterized. 5105 PCK1 phosphoenolpyruvate carboxykinase 1 ENSG00000124253 NA
This gene encodes the heavy chain subunit of the pre-alpha-trypsin inhibitor complex. This complex may stabilize the extracellular matrix through its ability to bind hyaluronic acid. Polymorphisms of this gene may be associated with increased risk for schizophrenia and major depressive disorder. This gene is present in an inter-alpha-trypsin inhibitor family gene cluster on chromosome 3. 3699 ITIH3 inter-alpha-trypsin inhibitor heavy chain 3 ENSG00000162267 NA
This gene encodes a plasma glycoprotein that binds heme with high affinity. The encoded protein is an acute phase protein that transports heme from the plasma to the liver and may be involved in protecting cells from oxidative stress. 3263 HPX hemopexin ENSG00000110169 NA
This gene encodes a glycoprotein with an approximate molecular weight of 76.5 kDa. It is thought to have been created as a result of an ancient gene duplication event that led to generation of homologous C and N-terminal domains each of which binds one ion of ferric iron. The function of this protein is to transport iron from the intestine, reticuloendothelial system, and liver parenchymal cells to all proliferating cells in the body. This protein may also have a physiologic role as granulocyte/pollen-binding protein (GPBP) involved in the removal of certain organic matter and allergens from serum. 7018 TF transferrin ENSG00000091513 NA
The mitochondrial enzyme encoded by this gene catalyzes synthesis of carbamoyl phosphate from ammonia and bicarbonate. This reaction is the first committed step of the urea cycle, which is important in the removal of excess urea from cells. The encoded protein may also represent a core mitochondrial nucleoid protein. Three transcript variants encoding different isoforms have been found for this gene. The shortest isoform may not be localized to the mitochondrion. Mutations in this gene have been associated with carbamoyl phosphate synthetase deficiency, susceptibility to persistent pulmonary hypertension, and susceptibility to venoocclusive disease after bone marrow transplantation. 1373 CPS1 carbamoyl-phosphate synthase 1 ENSG00000021826 NA
This antimicrobial gene belongs to the cytokine gene family which encode secreted proteins involved in immunoregulatory and inflammatory processes. The protein encoded by this gene is structurally related to the CXC (Cys-X-Cys) subfamily of cytokines. Members of this subfamily are characterized by two cysteines separated by a single amino acid. This cytokine displays chemotactic activity for monocytes but not for lymphocytes, dendritic cells, neutrophils or macrophages. It has been implicated that this cytokine is involved in the homeostasis of monocyte-derived macrophages rather than in inflammation. 9547 CXCL14 C-X-C motif chemokine ligand 14 ENSG00000145824 NA
NA 104326055 APOA1-AS APOA1 antisense RNA ENSG00000235910 NA
This gene encodes a member of the apolipoprotein C1 family. This gene is expressed primarily in the liver, and it is activated when monocytes differentiate into macrophages. The encoded protein plays a central role in high density lipoprotein (HDL) and very low density lipoprotein (VLDL) metabolism. This protein has also been shown to inhibit cholesteryl ester transfer protein in plasma. A pseudogene of this gene is located 4 kb downstream in the same orientation, on the same chromosome. This gene is mapped to chromosome 19, where it resides within a apolipoprotein gene cluster. 341 APOC1 apolipoprotein C1 ENSG00000130208 NA
The protein encoded by this gene is the beta component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including afibrinogenemia, dysfibrinogenemia, hypodysfibrinogenemia and thrombotic tendency. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 2244 FGB fibrinogen beta chain ENSG00000171564 NA
Polyspecific organic cation transporters in the liver, kidney, intestine, and other organs are critical for elimination of many endogenous small organic cations as well as a wide array of drugs and environmental toxins. This gene is one of three similar cation transporter genes located in a cluster on chromosome 6. The encoded protein contains twelve putative transmembrane domains and is a plasma integral membrane protein. Two transcript variants encoding two different isoforms have been found for this gene, but only the longer variant encodes a functional transporter. 6580 SLC22A1 solute carrier family 22 member 1 ENSG00000175003 NA
The protein encoded by this gene is a metalloprotein that binds most of the copper in plasma and is involved in the peroxidation of Fe(II)transferrin to Fe(III) transferrin. Mutations in this gene cause aceruloplasminemia, which results in iron accumulation and tissue damage, and is associated with diabetes and neurologic abnormalities. Two transcript variants, one protein-coding and the other not protein-coding, have been found for this gene. 1356 CP ceruloplasmin (ferroxidase) ENSG00000047457 NA
The protein encoded by this gene belongs to the lipocalin family. It is one of the three subunits that constitutes complement component 8 (C8), which is composed of a disulfide-linked C8 alpha-gamma heterodimer and a non-covalently associated C8 beta chain. C8 participates in the formation of the membrane attack complex (MAC) on bacterial cell membranes. While subunits alpha and beta play a role in complement-mediated bacterial killing, the gamma subunit is not required for the bactericidal activity. 733 C8G complement component 8, gamma polypeptide ENSG00000176919 NA
This gene encodes a protein that contains a ubiquitin associated domain at the N-terminus, an SH3 domain, and a C-terminal domain with similarities to the catalytic motif of phosphoglycerate mutase. The encoded protein was found to inhibit endocytosis of epidermal growth factor receptor (EGFR) and platelet-derived growth factor receptor. 84959 UBASH3B ubiquitin associated and SH3 domain containing B ENSG00000154127 NA
This gene is one of several genes encoding pulmonary-surfactant associated proteins (SFTPA) located on chromosome 10. Mutations in this gene and a highly similar gene located nearby, which affect the highly conserved carbohydrate recognition domain, are associated with idiopathic pulmonary fibrosis. The current version of the assembly displays only a single centromeric SFTPA gene pair rather than the two gene pairs shown in the previous assembly which were thought to have resulted from a duplication. 729238 SFTPA2 surfactant protein A2 ENSG00000185303 NA
The protein encoded by this gene is an enzyme in the catabolic pathway of tyrosine. The encoded protein catalyzes the conversion of 4-hydroxyphenylpyruvate to homogentisate. Defects in this gene are a cause of tyrosinemia type 3 (TYRO3) and hawkinsinuria (HAWK). Two transcript variants encoding different isoforms have been found for this gene. 3242 HPD 4-hydroxyphenylpyruvate dioxygenase ENSG00000158104 NA
This gene encodes a GTPase-activating protein that activates the small guanine-nucleotide-binding protein Rap1 in platelets. The protein interacts with synaptotagmin-like protein 1 and Rab27 and regulates secretion of dense granules from platelets at sites of endothelial damage. Multiple transcript variants encoding different isoforms have been found for this gene. 23108 RAP1GAP2 RAP1 GTPase activating protein 2 ENSG00000132359 NA
This gene encodes the protein core of a seminal plasma proteoglycan containing chondroitin- and heparan-sulfate chains. The protein’s function is unknown, although similarity to thyropin-type cysteine protease-inhibitors suggests its function may be related to protease inhibition. 6695 SPOCK1 sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 1 ENSG00000152377 NA
This gene is a member of the septin gene family of nucleotide binding proteins, originally described in yeast as cell division cycle regulatory proteins. Septins are highly conserved in yeast, Drosophila, and mouse and appear to regulate cytoskeletal organization. Disruption of septin function disturbs cytokinesis and results in large multinucleate or polyploid cells. This gene is mapped to 22q11, the region frequently deleted in DiGeorge and velocardiofacial syndromes. A translocation involving the MLL gene and this gene has also been reported in patients with acute myeloid leukemia. Alternative splicing results in multiple transcript variants. The presence of a non-consensus polyA signal (AACAAT) in this gene also results in read-through transcription into the downstream neighboring gene (GP1BB; platelet glycoprotein Ib), whereby larger, non-coding transcripts are produced. 5413 SEPT5 septin 5 ENSG00000184702 NA
NA 90139 TSPAN18 tetraspanin 18 ENSG00000157570 NA
NA 8608 RDH16 retinol dehydrogenase 16 (all-trans) ENSG00000139547 NA
This gene encodes a preproprotein, which is processed to yield both alpha and beta chains, which subsequently combine as a tetramer to produce haptoglobin. Haptoglobin functions to bind free plasma hemoglobin, which allows degradative enzymes to gain access to the hemoglobin, while at the same time preventing loss of iron through the kidneys and protecting the kidneys from damage by hemoglobin. Mutations in this gene and/or its regulatory regions cause ahaptoglobinemia or hypohaptoglobinemia. This gene has also been linked to diabetic nephropathy, the incidence of coronary artery disease in type 1 diabetes, Crohn’s disease, inflammatory disease behavior, primary sclerosing cholangitis, susceptibility to idiopathic Parkinson’s disease, and a reduced incidence of Plasmodium falciparum malaria. The protein encoded also exhibits antimicrobial activity against bacteria. A similar duplicated gene is located next to this gene on chromosome 16. Multiple transcript variants encoding different isoforms have been found for this gene. 3240 HP haptoglobin ENSG00000257017 NA
NA 9645 MICAL2 microtubule associated monooxygenase, calponin and LIM domain containing 2 ENSG00000133816 NA
Protein kinase C (PKC) zeta is a member of the PKC family of serine/threonine kinases which are involved in a variety of cellular processes such as proliferation, differentiation and secretion. Unlike the classical PKC isoenzymes which are calcium-dependent, PKC zeta exhibits a kinase activity which is independent of calcium and diacylglycerol but not of phosphatidylserine. Furthermore, it is insensitive to typical PKC inhibitors and cannot be activated by phorbol ester. Unlike the classical PKC isoenzymes, it has only a single zinc finger module. These structural and biochemical properties indicate that the zeta subspecies is related to, but distinct from other isoenzymes of PKC. Alternative splicing results in multiple transcript variants encoding different isoforms. 5590 PRKCZ protein kinase C zeta ENSG00000067606 NA
NA ENSG00000269934 RP5-1139B12.3 NA ENSG00000269934 NA
This gene encodes a protein which binds with glycosaminoglycans to form part of the extracellular matrix. The protein contains thyroglobulin type-1, follistatin-like, and calcium-binding domains, and has glycosaminoglycan attachment sites in the acidic C-terminal region. Three alternatively spliced transcript variants that encode different protein isoforms have been described for this gene. 9806 SPOCK2 sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 2 ENSG00000107742 NA
The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins. 1 A1BG alpha-1-B glycoprotein ENSG00000121410 NA
NA 11123 RCAN3 RCAN family member 3 ENSG00000117602 NA
NA ENSG00000214425 LRRC37A4P leucine-rich repeat containing 37 member A4, pseudogene ENSG00000214425 NA
The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. 125 ADH1B alcohol dehydrogenase 1B (class I), beta polypeptide ENSG00000196616 NA
The protein encoded by this gene is a bifunctional enzyme that channels 1-carbon units from formiminoglutamate, a metabolite of the histidine degradation pathway, to the folate pool. Mutations in this gene are associated with glutamate formiminotransferase deficiency. Alternatively spliced transcript variants have been found for this gene. 10841 FTCD formimidoyltransferase cyclodeaminase ENSG00000160282 NA
NA 55908 ANGPTL8 angiopoietin like 8 ENSG00000130173 NA
This gene encodes a nuclear protein with three C2H2-type zinc fingers, and functions as a transcriptional repressor. Chromosomal aberrations involving this gene are associated with endometrial stromal tumors. Alternatively spliced variants which encode different protein isoforms have been described; however, not all variants have been fully characterized 221895 JAZF1 JAZF zinc finger 1 ENSG00000153814 NA
Tight junctions represent one mode of cell-to-cell adhesion in epithelial or endothelial cell sheets, forming continuous seals around cells and serving as a physical barrier to prevent solutes and water from passing freely through the paracellular space. These junctions are comprised of sets of continuous networking strands in the outwardly facing cytoplasmic leaflet, with complementary grooves in the inwardly facing extracytoplasmic leaflet. The protein encoded by this gene, a member of the claudin family, is an integral membrane protein and a component of tight junction strands. Loss of function mutations result in neonatal ichthyosis-sclerosing cholangitis syndrome. 9076 CLDN1 claudin 1 ENSG00000163347 NA
The protein encoded by this gene contains a pleckstrin homology (PH) domain and an oxysterol-binding region. It binds oxysterols such as 7-ketocholesterol and may inhibit their cytotoxicity. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 23762 OSBP2 oxysterol binding protein 2 ENSG00000184792 NA
Syntaxin-1, synaptobrevin/VAMP, and SNAP25 interact to form the SNARE complex, which is required for synaptic vesicle docking and fusion. The protein encoded by this gene is membrane-associated and inhibits SNARE complex formation by binding free syntaxin-1. Expression of this gene appears to be brain-specific. Alternative splicing results in multiple transcript variants encoding different isoforms. 9751 SNPH syntaphilin ENSG00000101298 NA
This gene encodes a member of a family of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferases, which catalyze the transfer of N-acetylgalactosamine (GalNAc) from UDP-GalNAc to a serine or threonine residue on a polypeptide acceptor in the initial step of O-linked protein glycosylation. Mutations in this gene are associated with an increased susceptibility to colorectal cancer. 79695 GALNT12 polypeptide N-acetylgalactosaminyltransferase 12 ENSG00000119514 NA
NA 51560 RAB6B RAB6B, member RAS oncogene family ENSG00000154917 NA
This receptor binds insulin-like growth factor with a high affinity. It has tyrosine kinase activity. The insulin-like growth factor I receptor plays a critical role in transformation events. Cleavage of the precursor generates alpha and beta subunits. It is highly overexpressed in most malignant tissues where it functions as an anti-apoptotic agent by enhancing cell survival. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. 3480 IGF1R insulin like growth factor 1 receptor ENSG00000140443 NA
NA 9764 KIAA0513 KIAA0513 ENSG00000135709 NA
NA ENSG00000189316 RP11-797H7.5 NA ENSG00000189316 NA
NA ENSG00000215861 WI2-1896O14.1 NA ENSG00000215861 NA
This gene encodes a protein thought to be a component of the radial spoke head in motile cilia and flagella. Mutations in this gene are associated with primary ciliary dyskinesia 12. Alternative splicing results in multiple transcript variants. 221421 RSPH9 radial spoke head 9 homolog ENSG00000172426 NA
This protein belongs to the aldehyde dehydrogenase family of proteins. This enzyme is a mitochondrial matrix NAD-dependent dehydrogenase which catalyzes the second step of the proline degradation pathway, converting pyrroline-5-carboxylate to glutamate. Deficiency of this enzyme is associated with type II hyperprolinemia, an autosomal recessive disorder characterized by accumulation of delta-1-pyrroline-5-carboxylate (P5C) and proline. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. 8659 ALDH4A1 aldehyde dehydrogenase 4 family member A1 ENSG00000159423 NA
NA 161145 TMEM229B transmembrane protein 229B ENSG00000198133 NA
The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. 27063 ANKRD1 ankyrin repeat domain 1 ENSG00000148677 NA
NA 27124 INPP5J inositol polyphosphate-5-phosphatase J ENSG00000185133 NA
NA 9911 TMCC2 transmembrane and coiled-coil domain family 2 ENSG00000133069 NA
This gene produces alternative transcripts encoding two distinct proteins. One protein is a transcriptional repressor, while the other isoform is a major component of specialized synapses known as synaptic ribbons. Both proteins contain a NAD+ binding domain similar to NAD+-dependent 2-hydroxyacid dehydrogenases. A portion of the 3’ untranslated region was used to map this gene to chromosome 21q21.3; however, it was noted that similar loci elsewhere in the genome are likely. Blast analysis shows that this gene is present on chromosome 10. Several transcript variants encoding two different isoforms have been found for this gene. 1488 CTBP2 C-terminal binding protein 2 ENSG00000175029 NA
This gene encodes a member of the gap junction protein family. The gap junctions were first characterized by electron microscopy as regionally specialized structures on plasma membranes of contacting adherent cells. These structures were shown to consist of cell-to-cell channels that facilitate the transfer of ions and small molecules between cells. The gap junction proteins, also known as connexins, purified from fractions of enriched gap junctions from different tissues differ. According to sequence similarities at the nucleotide and amino acid levels, the gap junction proteins are divided into two categories, alpha and beta. Mutations in this gene are responsible for as much as 50% of pre-lingual, recessive deafness. 2706 GJB2 gap junction protein beta 2 ENSG00000165474 NA
NA 3797 KIF3C kinesin family member 3C ENSG00000084731 NA
NA ENSG00000254680 RP11-265D17.2 NA ENSG00000254680 NA
IGSF4B is a brain-specific protein related to the calcium-independent cell-cell adhesion molecules known as nectins (see PVRL3; MIM 607147) (Kakunaga et al., 2005 [PubMed 15741237]). 57863 CADM3 cell adhesion molecule 3 ENSG00000162706 NA
NA ENSG00000268230 CTD-2619J13.8 NA ENSG00000268230 NA
NA ENSG00000261172 RP11-356C4.5 NA ENSG00000261172 NA
This gene encodes apolipoprotein A-I, which is the major protein component of high density lipoprotein (HDL) in plasma. The encoded preproprotein is proteolytically processed to generate the mature protein, which promotes cholesterol efflux from tissues to the liver for excretion, and is a cofactor for lecithin cholesterolacyltransferase (LCAT), an enzyme responsible for the formation of most plasma cholesteryl esters. This gene is closely linked with two other apolipoprotein genes on chromosome 11. Defects in this gene are associated with HDL deficiencies, including Tangier disease, and with systemic non-neuropathic amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein. 335 APOA1 apolipoprotein A1 ENSG00000118137 NA
This gene belongs to the family of reticulon encoding genes. Reticulons are associated with the endoplasmic reticulum, and are involved in neuroendocrine secretion or in membrane trafficking in neuroendocrine cells. This gene is considered to be a specific marker for neurological diseases and cancer, and is a potential molecular target for therapy. Alternative splicing results in multiple transcript variants. 6252 RTN1 reticulon 1 ENSG00000139970 NA
This gene is specifically expressed in the central nervous system (CNS). It encodes a member of the DOCK (dedicator of cytokinesis) family of guanine nucleotide exchange factors (GEFs). This protein, dedicator of cytokinesis 3 (DOCK3), is also known as modifier of cell adhesion (MOCA) and presenilin-binding protein (PBP). The DOCK3 and DOCK1, -2 and -4 share several conserved amino acids in their DHR-2 (DOCK homology region 2) domains that are required for GEF activity, and bind directly to WAVE proteins [Wiskott-Aldrich syndrome protein (WASP) family Verprolin-homologous proteins] via their DHR-1 domains. The DOCK3 induces axonal outgrowth in CNS by stimulating membrane recruitment of the WAVE complex and activating the small G protein Rac1. This gene is associated with an attention deficit hyperactivity disorder-like phenotype by a complex chromosomal rearrangement. 1795 DOCK3 dedicator of cytokinesis 3 ENSG00000088538 NA
This gene encodes a type I transmembrane protein that is localized to junctional complexes between endothelial and epithelial cells and may have a role in cell-cell adhesion. Expression of this gene in white adipose tissue is implicated in adipocyte maturation and development of obesity. This gene is also essential for normal intestinal development and mutations in the gene are associated with congenital short bowel syndrome. 79827 CLMP CXADR-like membrane protein ENSG00000166250 NA
The protein encoded by this gene is a homeobox-containing transcription factor of the POU domain family. The encoded protein binds the octamer sequence 5’-ATTTGCAT-3’, a common transcription factor binding site in immunoglobulin gene promoters. Several transcript variants encoding different isoforms have been found for this gene. 5452 POU2F2 POU class 2 homeobox 2 ENSG00000028277 NA
NA ENSG00000232320 AC009299.5 NA ENSG00000232320 NA
NA 100874235 CACNA1C-AS2 CACNA1C antisense RNA 2 ENSG00000256271 NA
This gene belongs to the ephrin receptor subfamily of the protein-tyrosine kinase family. EPH and EPH-related receptors have been implicated in mediating developmental events, particularly in the nervous system. Receptors in the EPH subfamily typically have a single kinase domain and an extracellular region containing a Cys-rich domain and 2 fibronectin type III repeats. The ephrin receptors are divided into 2 groups based on the similarity of their extracellular domain sequences and their affinities for binding ephrin-A and ephrin-B ligands. Multiple transcript variants encoding different isoforms have been found for this gene. 2043 EPHA4 EPH receptor A4 ENSG00000116106 NA
The protein encoded by this gene is a member of the p55 Stardust family of membrane-associated guanylate kinase (MAGUK) proteins, which function in the establishment of epithelial cell polarity. This family member forms a complex with the polarity protein DLG1 (discs, large homolog 1) and facilitates epithelial cell polarity and tight junction formation. Polymorphisms in this gene are associated with variations in site-specific bone mineral density (BMD). Alternative splicing results in multiple transcript variants. 143098 MPP7 membrane palmitoylated protein 7 ENSG00000150054 NA
NA 23127 COLGALT2 collagen beta(1-O)galactosyltransferase 2 ENSG00000198756 NA
This gene encodes a protein similar to the rat neuronal pentraxin receptor. The rat pentraxin receptor is an integral membrane protein that is thought to mediate neuronal uptake of the snake venom toxin, taipoxin, and its transport into the synapses. Studies in rat indicate that translation of this mRNA initiates at a non-AUG (CUG) codon. This may also be true for mouse and human, based on strong sequence conservation amongst these species. 23467 NPTXR neuronal pentraxin receptor ENSG00000221890 NA
This gene encodes a member of the HOMER family of postsynaptic density scaffolding proteins that share a similar domain structure consisting of an N-terminal Enabled/vasodilator-stimulated phosphoprotein homology 1 domain which mediates protein-protein interactions, and a carboxy-terminal coiled-coil domain and two leucine zipper motifs that are involved in self-oligomerization. The encoded protein binds numerous other proteins including group I metabotropic glutamate receptors, inositol 1,4,5-trisphosphate receptors and amyloid precursor proteins and has been implicated in diverse biological functions such as neuronal signaling, T-cell activation and trafficking of amyloid beta peptides. Alternative splicing results in multiple transcript variants. 9454 HOMER3 homer scaffolding protein 3 ENSG00000051128 NA
The gene is a member of the syntaxin family. The encoded protein is targeted to the apical membrane of epithelial cells where it forms clusters and is important in establishing and maintaining polarity necessary for protein trafficking involving vesicle fusion and exocytosis. Alternative splicing results in multiple transcript variants. 6809 STX3 syntaxin 3 ENSG00000166900 NA
The obscurin gene spans more than 150 kb, contains over 80 exons and encodes a protein of approximately 720 kDa. The encoded protein contains 68 Ig domains, 2 fibronectin domains, 1 calcium/calmodulin-binding domain, 1 RhoGEF domain with an associated PH domain, and 2 serine-threonine kinase domains. This protein belongs to the family of giant sacromeric signaling proteins that includes titin and nebulin, and may have a role in the organization of myofibrils during assembly and may mediate interactions between the sarcoplasmic reticulum and myofibrils. Alternatively spliced transcript variants encoding different isoforms have been identified. 84033 OBSCN obscurin, cytoskeletal calmodulin and titin-interacting RhoGEF ENSG00000154358 NA
The leucine-rich repeat (LRR) family of proteins, including LRG1, have been shown to be involved in protein-protein interaction, signal transduction, and cell adhesion and development. LRG1 is expressed during granulocyte differentiation (O’Donnell et al., 2002 [PubMed 12223515]). 116844 LRG1 leucine rich alpha-2-glycoprotein 1 ENSG00000171236 NA
NA 57214 CEMIP cell migration inducing hyaluronan binding protein ENSG00000103888 NA
Hexokinases phosphorylate glucose to produce glucose-6-phosphate, the first step in most glucose metabolism pathways. This gene encodes a ubiquitous form of hexokinase which localizes to the outer membrane of mitochondria. Mutations in this gene have been associated with hemolytic anemia due to hexokinase deficiency. Alternative splicing of this gene results in several transcript variants which encode different isoforms, some of which are tissue-specific. 3098 HK1 hexokinase 1 ENSG00000156515 NA
NA ENSG00000227227 AC017101.10 NA ENSG00000227227 NA
NA NA NA NA ENSG00000270172 TRUE
NA ENSG00000252464 RN7SKP70 RNA, 7SK small nuclear pseudogene 70 ENSG00000252464 NA
NA 115572 FAM46B family with sequence similarity 46 member B ENSG00000158246 NA
This gene encodes a member of the annexin family. Members of this calcium-dependent phospholipid-binding protein family play a role in the regulation of cellular growth and in signal transduction pathways. This protein functions in the inhibition of phopholipase A2 and cleavage of inositol 1,2-cyclic phosphate to form inositol 1-phosphate. This protein may also play a role in anti-coagulation. 306 ANXA3 annexin A3 ENSG00000138772 NA
Polyspecific organic cation transporters in the liver, kidney, intestine, and other organs are critical for elimination of many endogenous small organic cations as well as a wide array of drugs and environmental toxins. The encoded protein is an organic cation transporter and plasma integral membrane protein containing eleven putative transmembrane domains as well as a nucleotide-binding site motif. Transport by this protein is at least partially ATP-dependent. 6583 SLC22A4 solute carrier family 22 member 4 ENSG00000197208 NA
This locus encodes a sulfotransferase protein. The encoded enzyme catalyzes the sulfation of a nonreducing N-acetylglucosamine residue, and may play a role in biosynthesis of 6-sulfosialyl Lewis X antigen. 9435 CHST2 carbohydrate sulfotransferase 2 ENSG00000175040 NA
This gene encodes a membrane-bound protein which is a member of the ELO family, proteins which participate in the biosynthesis of fatty acids. Consistent with the expression of the encoded protein in photoreceptor cells of the retina, mutations and small deletions in this gene are associated with Stargardt-like macular dystrophy (STGD3) and autosomal dominant Stargardt-like macular dystrophy (ADMD), also referred to as autosomal dominant atrophic macular degeneration. 6785 ELOVL4 ELOVL fatty acid elongase 4 ENSG00000118402 NA
This gene encodes an inwardly rectifying K+ channel which may be blocked by divalent cations. This protein is thought to be one of multiple inwardly rectifying channels which contribute to the cardiac inward rectifier current (IK1). The gene is located within the Smith-Magenis syndrome region on chromosome 17. 3768 KCNJ12 potassium voltage-gated channel subfamily J member 12 ENSG00000184185 NA
This gene encodes a protein associated with the cytoplasmic surface of synaptic vesicles. A subset of patients with stiff-man syndrome who were also affected by breast cancer are positive for autoantibodies against this protein. Alternate splicing of this gene results in two transcript variants encoding different isoforms. Additional splice variants have been described, but their full length sequences have not been determined. A pseudogene of this gene is found on chromosome 11. 273 AMPH amphiphysin ENSG00000078053 NA
NA 84940 CORO6 coronin 6 ENSG00000167549 NA
This gene is a member of the aggrecan/versican proteoglycan family. The protein encoded is a large chondroitin sulfate proteoglycan and is a major component of the extracellular matrix. This protein is involved in cell adhesion, proliferation, proliferation, migration and angiogenesis and plays a central role in tissue morphogenesis and maintenance. Mutations in this gene are the cause of Wagner syndrome type 1. Multiple transcript variants encoding different isoforms have been found for this gene. 1462 VCAN versican ENSG00000038427 NA
This gene encodes a filamentous actin-binding protein that may function in cell adhesion and migration. Mutations in this gene have been associated with dilated cardiomyopathy, also known as CMD1CC. Alternatively spliced transcript variants have been described. 91624 NEXN nexilin F-actin binding protein ENSG00000162614 NA
NA NA NA NA ENSG00000229874 TRUE
NA ENSG00000239775 AC017116.11 NA ENSG00000239775 NA
The protein encoded by this gene belongs to the B-cell CLL/lymphoma 2 and adenovirus E1B 19 kDa interacting family, whose members play roles in many cellular processes including apotosis, cell transformation, and synaptic function. Several functions for this protein have been demonstrated including suppression of Ras homolog family member A activity, which results in reduced stress fiber formation and suppression of oncogenic cellular transformation. A high molecular weight isoform of this protein has also been shown to colocalize with Adaptor protein complex 2, beta-Adaptin and endodermal markers, suggesting an involvement in post-endocytic trafficking. In prostate cancer cells, this gene acts as a tumor suppressor and its expression is regulated by prostate cancer antigen 3, a non-protein coding gene on the opposite DNA strand in an intron of this gene. Prostate cancer antigen 3 regulates levels of this gene through formation of a double-stranded RNA that undergoes adenosine deaminase actin on RNA-dependent adenosine-to-inosine RNA editing. Alternative splicing results in multiple transcript variants. 158471 PRUNE2 prune homolog 2 ENSG00000106772 NA
NA 55365 TMEM176A transmembrane protein 176A ENSG00000002933 NA
NA 10570 DPYSL4 dihydropyrimidinase like 4 ENSG00000151640 NA
This gene encodes a member of the latrophilin subfamily of G-protein coupled receptors (GPCR). Latrophilins may function in both cell adhesion and signal transduction. In experiments with non-human species, endogenous proteolytic cleavage within a cysteine-rich GPS (G-protein-coupled-receptor proteolysis site) domain resulted in two subunits (a large extracellular N-terminal cell adhesion subunit and a subunit with substantial similarity to the secretin/calcitonin family of GPCRs) being non-covalently bound at the cell membrane. Latrophilin-1 has been shown to recruit the neurotoxin from black widow spider venom, alpha-latrotoxin, to the synapse plasma membrane. Alternative splicing results in multiple variants encoding distinct isoforms. 22859 ADGRL1 adhesion G protein-coupled receptor L1 ENSG00000072071 NA
NA 57210 SLC45A4 solute carrier family 45 member 4 ENSG00000022567 NA
This gene encodes an enzyme involved in catalyzing the conversion of angiotensin I into a physiologically active peptide angiotensin II. Angiotensin II is a potent vasopressor and aldosterone-stimulating peptide that controls blood pressure and fluid-electrolyte balance. This enzyme plays a key role in the renin-angiotensin system. Many studies have associated the presence or absence of a 287 bp Alu repeat element in this gene with the levels of circulating enzyme or cardiovascular pathophysiologies. Multiple alternatively spliced transcript variants encoding different isoforms have been identified, and two most abundant spliced variants encode the somatic form and the testicular form, respectively, that are equally active. 1636 ACE angiotensin I converting enzyme ENSG00000159640 NA
This gene encodes a member of the VPS10-related sortilin family of proteins. The encoded preproprotein is proteolytically processed by furin to generate the mature receptor. This receptor plays a role in the trafficking of different proteins to either the cell surface, or subcellular compartments such as lysosomes and endosomes. Expression levels of this gene may influence the risk of myocardial infarction in human patients. Alternative splicing results in multiple transcript variants. 6272 SORT1 sortilin 1 ENSG00000134243 NA
NA 64753 CCDC136 coiled-coil domain containing 136 ENSG00000128596 NA
G-protein signaling modulators (GPSMs) play diverse functional roles through their interaction with G-protein subunits. This gene encodes a receptor-independent activator of G protein signaling, which is one of several factors that influence the basal activity of G-protein signaling systems. The protein contains seven tetratricopeptide repeats in its N-terminal half and four G-protein regulatory (GPR) motifs in its C-terminal half. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. 26086 GPSM1 G-protein signaling modulator 1 ENSG00000160360 NA
The Bloom syndrome gene product is related to the RecQ subset of DExH box-containing DNA helicases and has both DNA-stimulated ATPase and ATP-dependent DNA helicase activities. Mutations causing Bloom syndrome delete or alter helicase motifs and may disable the 3’-5’ helicase activity. The normal protein may act to suppress inappropriate recombination. 641 BLM Bloom syndrome RecQ like helicase ENSG00000197299 NA
The phosphatidylethanolamine (PE)-binding proteins, including PEBP4, are an evolutionarily conserved family of proteins with pivotal biologic functions, such as lipid binding and inhibition of serine proteases (Wang et al., 2004 [PubMed 15302887]). 157310 PEBP4 phosphatidylethanolamine binding protein 4 ENSG00000134020 NA
This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and is known to metabolize as many as 25% of commonly prescribed drugs. Its substrates include antidepressants, antipsychotics, analgesics and antitussives, beta adrenergic blocking agents, antiarrythmics and antiemetics. The gene is highly polymorphic in the human population; certain alleles result in the poor metabolizer phenotype, characterized by a decreased ability to metabolize the enzyme’s substrates. Some individuals with the poor metabolizer phenotype have no functional protein since they carry 2 null alleles whereas in other individuals the gene is absent. This gene can vary in copy number and individuals with the ultrarapid metabolizer phenotype can have 3 or more active copies of the gene. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 1565 CYP2D6 cytochrome P450 family 2 subfamily D member 6 ENSG00000100197 NA
This gene encodes a protein that belongs to the microtubule-associated protein family. The proteins of this family are thought to be involved in microtubule assembly, which is an essential step in neurogenesis. The product of this gene is a precursor polypeptide that presumably undergoes proteolytic processing to generate the final MAP1A heavy chain and LC2 light chain. Expression of this gene is almost exclusively in the brain. Studies of the rat microtubule-associated protein 1A gene suggested a role in early events of spinal cord development. 4130 MAP1A microtubule associated protein 1A ENSG00000166963 NA
The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. 5265 SERPINA1 serpin family A member 1 ENSG00000197249 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",9,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 10 Annotations

out <- mygene::queryMany(gene_list[10,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol query X_id name summary notfound
HSPA6 ENSG00000173110 3310 heat shock protein family A (Hsp70) member 6 NA NA
HILPDA ENSG00000135245 29923 hypoxia inducible lipid droplet associated NA NA
RP11-155G14.6 ENSG00000240758 ENSG00000240758 NA NA NA
PTGS2 ENSG00000073756 5743 prostaglandin-endoperoxide synthase 2 Prostaglandin-endoperoxide synthase (PTGS), also known as cyclooxygenase, is the key enzyme in prostaglandin biosynthesis, and acts both as a dioxygenase and as a peroxidase. There are two isozymes of PTGS: a constitutive PTGS1 and an inducible PTGS2, which differ in their regulation of expression and tissue distribution. This gene encodes the inducible isozyme. It is regulated by specific stimulatory events, suggesting that it is responsible for the prostanoid biosynthesis involved in inflammation and mitogenesis. NA
ZC3H12A ENSG00000163874 80149 zinc finger CCCH-type containing 12A ZC3H12A is an MCP1 (CCL2; MIM 158105)-induced protein that acts as a transcriptional activator and causes cell death of cardiomyocytes, possibly via induction of genes associated with apoptosis. NA
KRT8P50 ENSG00000260799 ENSG00000260799 keratin 8 pseudogene 50 NA NA
HSPA1B ENSG00000204388 3304 heat shock protein family A (Hsp70) member 1B This intronless gene encodes a 70kDa heat shock protein which is a member of the heat shock protein 70 family. In conjuction with other heat shock proteins, this protein stabilizes existing proteins against aggregation and mediates the folding of newly translated proteins in the cytosol and in organelles. It is also involved in the ubiquitin-proteasome pathway through interaction with the AU-rich element RNA-binding protein 1. The gene is located in the major histocompatibility complex class III region, in a cluster with two closely related genes which encode similar proteins. NA
SOCS3 ENSG00000184557 9021 suppressor of cytokine signaling 3 This gene encodes a member of the STAT-induced STAT inhibitor (SSI), also known as suppressor of cytokine signaling (SOCS), family. SSI family members are cytokine-inducible negative regulators of cytokine signaling. The expression of this gene is induced by various cytokines, including IL6, IL10, and interferon (IFN)-gamma. The protein encoded by this gene can bind to JAK2 kinase, and inhibit the activity of JAK2 kinase. Studies of the mouse counterpart of this gene suggested the roles of this gene in the negative regulation of fetal liver hematopoiesis, and placental development. NA
RGS2 ENSG00000116741 5997 regulator of G-protein signaling 2 Regulator of G protein signaling (RGS) family members are regulatory molecules that act as GTPase activating proteins (GAPs) for G alpha subunits of heterotrimeric G proteins. RGS proteins are able to deactivate G protein subunits of the Gi alpha, Go alpha and Gq alpha subtypes. They drive G proteins into their inactive GDP-bound forms. Regulator of G protein signaling 2 belongs to this family. The protein acts as a mediator of myeloid differentiation and may play a role in leukemogenesis. NA
IER3 ENSG00000137331 8870 immediate early response 3 This gene functions in the protection of cells from Fas- or tumor necrosis factor type alpha-induced apoptosis. Partially degraded and unspliced transcripts are found after virus infection in vitro, but these transcripts are not found in vivo and do not generate a valid protein. NA
SERBP1P3 ENSG00000242142 ENSG00000242142 SERPINE1 mRNA binding protein 1 pseudogene 3 NA NA
ARID5A ENSG00000196843 10865 AT-rich interaction domain 5A Members of the ARID protein family, including ARID5A, have diverse functions but all appear to play important roles in development, tissue-specific gene expression, and regulation of cell growth (Patsialou et al., 2005 [PubMed 15640446]). NA
CHRNE ENSG00000108556 1145 cholinergic receptor nicotinic epsilon subunit Acetylcholine receptors at mature mammalian neuromuscular junctions are pentameric protein complexes composed of four subunits in the ratio of two alpha subunits to one beta, one epsilon, and one delta subunit. The acetylcholine receptor changes subunit composition shortly after birth when the epsilon subunit replaces the gamma subunit seen in embryonic receptors. Mutations in the epsilon subunit are associated with congenital myasthenic syndrome. NA
BCL3 ENSG00000069399 602 B-cell CLL/lymphoma 3 This gene is a proto-oncogene candidate. It is identified by its translocation into the immunoglobulin alpha-locus in some cases of B-cell leukemia. The protein encoded by this gene contains seven ankyrin repeats, which are most closely related to those found in I kappa B proteins. This protein functions as a transcriptional co-activator that activates through its association with NF-kappa B homodimers. The expression of this gene can be induced by NF-kappa B, which forms a part of the autoregulatory loop that controls the nuclear residence of p50 NF-kappa B. NA
FOSL1 ENSG00000175592 8061 FOS like 1, AP-1 transcription factor subunit The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. Several transcript variants encoding different isoforms have been found for this gene. NA
CSRNP1 ENSG00000144655 64651 cysteine and serine rich nuclear protein 1 This gene encodes a protein that localizes to the nucleus and expression of this gene is induced in response to elevated levels of axin. The Wnt signalling pathway, which is negatively regulated by axin, is important in axis formation in early development and impaired regulation of this signalling pathway is often involved in tumors. A decreased level of expression of this gene in tumors compared to the level of expression in their corresponding normal tissues suggests that this gene product has a tumor suppressor function. Alternative splicing results in multiple transcript variants. NA
LOC105379695 ENSG00000272273 105379695 uncharacterized LOC105379695 NA NA
RP11-456P18.2 ENSG00000229808 ENSG00000229808 NA NA NA
C3orf52 ENSG00000114529 79669 chromosome 3 open reading frame 52 NA NA
CDKN1A ENSG00000124762 1026 cyclin-dependent kinase inhibitor 1A This gene encodes a potent cyclin-dependent kinase inhibitor. The encoded protein binds to and inhibits the activity of cyclin-cyclin-dependent kinase2 or -cyclin-dependent kinase4 complexes, and thus functions as a regulator of cell cycle progression at G1. The expression of this gene is tightly controlled by the tumor suppressor protein p53, through which this protein mediates the p53-dependent cell cycle G1 phase arrest in response to a variety of stress stimuli. This protein can interact with proliferating cell nuclear antigen, a DNA polymerase accessory factor, and plays a regulatory role in S phase DNA replication and DNA damage repair. This protein was reported to be specifically cleaved by CASP3-like caspases, which thus leads to a dramatic activation of cyclin-dependent kinase2, and may be instrumental in the execution of apoptosis following caspase activation. Mice that lack this gene have the ability to regenerate damaged or missing tissue. Multiple alternatively spliced variants have been found for this gene. NA
NFKB2 ENSG00000077150 4791 nuclear factor kappa B subunit 2 This gene encodes a subunit of the transcription factor complex nuclear factor-kappa-B (NFkB). The NFkB complex is expressed in numerous cell types and functions as a central activator of genes involved in inflammation and immune function. The protein encoded by this gene can function as both a transcriptional activator or repressor depending on its dimerization partner. The p100 full-length protein is co-translationally processed into a p52 active form. Chromosomal rearrangements and translocations of this locus have been observed in B cell lymphomas, some of which may result in the formation of fusion proteins. There is a pseudogene for this gene on chromosome 18. Alternative splicing results in multiple transcript variants. NA
SNORA73B ENSG00000200087 ENSG00000200087 small nucleolar RNA, H/ACA box 73B NA NA
SLC7A5 ENSG00000103257 8140 solute carrier family 7 member 5 NA NA
NA ENSG00000182368 NA NA NA TRUE
BCL2A1 ENSG00000140379 597 BCL2 related protein A1 This gene encodes a member of the BCL-2 protein family. The proteins of this family form hetero- or homodimers and act as anti- and pro-apoptotic regulators that are involved in a wide variety of cellular activities such as embryonic development, homeostasis and tumorigenesis. The protein encoded by this gene is able to reduce the release of pro-apoptotic cytochrome c from mitochondria and block caspase activation. This gene is a direct transcription target of NF-kappa B in response to inflammatory mediators, and is up-regulated by different extracellular signals, such as granulocyte-macrophage colony-stimulating factor (GM-CSF), CD40, phorbol ester and inflammatory cytokine TNF and IL-1, which suggests a cytoprotective function that is essential for lymphocyte activation as well as cell survival. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
CCDC150P1 ENSG00000256304 ENSG00000256304 coiled-coil domain containing 150 pseudogene 1 NA NA
CHI3L1 ENSG00000133048 1116 chitinase 3 like 1 Chitinases catalyze the hydrolysis of chitin, which is an abundant glycopolymer found in insect exoskeletons and fungal cell walls. The glycoside hydrolase 18 family of chitinases includes eight human family members. This gene encodes a glycoprotein member of the glycosyl hydrolase 18 family. The protein lacks chitinase activity and is secreted by activated macrophages, chondrocytes, neutrophils and synovial cells. The protein is thought to play a role in the process of inflammation and tissue remodeling. NA
PIM1 ENSG00000137193 5292 Pim-1 proto-oncogene, serine/threonine kinase The protein encoded by this gene belongs to the Ser/Thr protein kinase family, and PIM subfamily. This gene is expressed primarily in B-lymphoid and myeloid cell lines, and is overexpressed in hematopoietic malignancies and in prostate cancer. It plays a role in signal transduction in blood cells, contributing to both cell proliferation and survival, and thus provides a selective advantage in tumorigenesis. Both the human and orthologous mouse genes have been reported to encode two isoforms (with preferential cellular localization) resulting from the use of alternative in-frame translation initiation codons, the upstream non-AUG (CUG) and downstream AUG codons (PMIDs:16186805, 1825810). NA
ZFP36 ENSG00000128016 7538 ZFP36 ring finger protein NA NA
IL4R ENSG00000077238 3566 interleukin 4 receptor This gene encodes the alpha chain of the interleukin-4 receptor, a type I transmembrane protein that can bind interleukin 4 and interleukin 13 to regulate IgE production. The encoded protein also can bind interleukin 4 to promote differentiation of Th2 cells. A soluble form of the encoded protein can be produced by proteolysis of the membrane-bound protein, and this soluble form can inhibit IL4-mediated cell proliferation and IL5 upregulation by T-cells. Allelic variations in this gene have been associated with atopy, a condition that can manifest itself as allergic rhinitis, sinusitus, asthma, or eczema. Polymorphisms in this gene are also associated with resistance to human immunodeficiency virus type-1 infection. Alternate splicing results in multiple transcript variants. NA
ATF3 ENSG00000162772 467 activating transcription factor 3 This gene encodes a member of the mammalian activation transcription factor/cAMP responsive element-binding (CREB) protein family of transcription factors. This gene is induced by a variety of signals, including many of those encountered by cancer cells, and is involved in the complex process of cellular stress response. Multiple transcript variants encoding different isoforms have been found for this gene. It is possible that alternative splicing of this gene may be physiologically important in the regulation of target genes. NA
SIK1 ENSG00000142178 150094 salt inducible kinase 1 NA NA
YBX3 ENSG00000060138 8531 Y-box binding protein 3 NA NA
TNFAIP3 ENSG00000118503 7128 TNF alpha induced protein 3 This gene was identified as a gene whose expression is rapidly induced by the tumor necrosis factor (TNF). The protein encoded by this gene is a zinc finger protein and ubiqitin-editing enzyme, and has been shown to inhibit NF-kappa B activation as well as TNF-mediated apoptosis. The encoded protein, which has both ubiquitin ligase and deubiquitinase activities, is involved in the cytokine-mediated immune and inflammatory responses. Several transcript variants encoding the same protein have been found for this gene. NA
HIST1H1E ENSG00000168298 3008 histone cluster 1, H1e Histones are basic nuclear proteins responsible for nucleosome structure of the chromosomal fiber in eukaryotes. Two molecules of each of the four core histones (H2A, H2B, H3, and H4) form an octamer, around which approximately 146 bp of DNA is wrapped in repeating units, called nucleosomes. The linker histone, H1, interacts with linker DNA between nucleosomes and functions in the compaction of chromatin into higher order structures. This gene is intronless and encodes a replication-dependent histone that is a member of the histone H1 family. Transcripts from this gene lack polyA tails but instead contain a palindromic termination element. This gene is found in the large histone gene cluster on chromosome 6. NA
MAFF ENSG00000185022 23764 MAF bZIP transcription factor F The protein encoded by this gene is a basic leucine zipper (bZIP) transcription factor that lacks a transactivation domain. It is known to bind the US-2 DNA element in the promoter of the oxytocin receptor (OTR) gene and most likely heterodimerizes with other leucine zipper-containing proteins to enhance expression of the OTR gene during term pregnancy. The encoded protein can also form homodimers, and since it lacks a transactivation domain, the homodimer may act as a repressor of transcription. This gene may also be involved in the cellular stress response. Multiple transcript variants encoding two different isoforms have been found for this gene. NA
GPR84 ENSG00000139572 53831 G protein-coupled receptor 84 NA NA
BHLHE40 ENSG00000134107 8553 basic helix-loop-helix family member e40 This gene encodes a basic helix-loop-helix protein expressed in various tissues. The encoded protein can interact with ARNTL or compete for E-box binding sites in the promoter of PER1 and repress CLOCK/ARNTL’s transactivation of PER1. This gene is believed to be involved in the control of circadian rhythm and cell differentiation. NA
PMAIP1 ENSG00000141682 5366 phorbol-12-myristate-13-acetate-induced protein 1 NA NA
GADD45B ENSG00000099860 4616 growth arrest and DNA damage inducible beta This gene is a member of a group of genes whose transcript levels are increased following stressful growth arrest conditions and treatment with DNA-damaging agents. The genes in this group respond to environmental stresses by mediating activation of the p38/JNK pathway. This activation is mediated via their proteins binding and activating MTK1/MEKK4 kinase, which is an upstream activator of both p38 and JNK MAPKs. The function of these genes or their protein products is involved in the regulation of growth and apoptosis. These genes are regulated by different mechanisms, but they are often coordinately expressed and can function cooperatively in inhibiting cell growth. NA
SLC2A3 ENSG00000059804 6515 solute carrier family 2 member 3 NA NA
RP13-638C3.2 ENSG00000262652 ENSG00000262652 NA NA NA
NAMPT ENSG00000105835 10135 nicotinamide phosphoribosyltransferase This gene encodes a protein that catalyzes the condensation of nicotinamide with 5-phosphoribosyl-1-pyrophosphate to yield nicotinamide mononucleotide, one step in the biosynthesis of nicotinamide adenine dinucleotide. The protein belongs to the nicotinic acid phosphoribosyltransferase (NAPRTase) family and is thought to be involved in many important biological processes, including metabolism, stress response and aging. This gene has a pseudogene on chromosome 10. NA
LOC100506142 ENSG00000250116 100506142 uncharacterized LOC100506142 NA NA
IL1B ENSG00000125538 3553 interleukin 1 beta The protein encoded by this gene is a member of the interleukin 1 cytokine family. This cytokine is produced by activated macrophages as a proprotein, which is proteolytically processed to its active form by caspase 1 (CASP1/ICE). This cytokine is an important mediator of the inflammatory response, and is involved in a variety of cellular activities, including cell proliferation, differentiation, and apoptosis. The induction of cyclooxygenase-2 (PTGS2/COX2) by this cytokine in the central nervous system (CNS) is found to contribute to inflammatory pain hypersensitivity. This gene and eight other interleukin 1 family genes form a cytokine gene cluster on chromosome 2. NA
RP11-373D23.2 ENSG00000270640 ENSG00000270640 NA NA NA
SNORA31 ENSG00000199477 677814 small nucleolar RNA, H/ACA box 31 NA NA
DUSP2 ENSG00000158050 1844 dual specificity phosphatase 2 The protein encoded by this gene is a member of the dual specificity protein phosphatase subfamily. These phosphatases inactivate their target kinases by dephosphorylating both the phosphoserine/threonine and phosphotyrosine residues. They negatively regulate members of the mitogen-activated protein (MAP) kinase superfamily (MAPK/ERK, SAPK/JNK, p38), which are associated with cellular proliferation and differentiation. Different members of the family of dual specificity phosphatases show distinct substrate specificities for various MAP kinases, different tissue distribution and subcellular localization, and different modes of inducibility of their expression by extracellular stimuli. This gene product inactivates ERK1 and ERK2, is predominantly expressed in hematopoietic tissues, and is localized in the nucleus. NA
NAMPTP1 ENSG00000229644 ENSG00000229644 nicotinamide phosphoribosyltransferase pseudogene 1 NA NA
RGPD2 ENSG00000185304 729857 RANBP2-like and GRIP domain containing 2 NA NA
RP11-34P13.15 ENSG00000268903 ENSG00000268903 NA NA NA
HUS1B ENSG00000188996 135458 HUS1 checkpoint clamp component B The protein encoded by this gene is most closely related to HUS1, a component of a cell cycle checkpoint protein complex involved in cell cycle arrest in response to DNA damage. This protein can interact with the check point protein RAD1 but not with RAD9. Overexpression of this protein has been shown to induce cell death, which suggests a related but distinct role of this protein, as compared to the HUS1. NA
AC017104.6 ENSG00000224376 ENSG00000224376 NA NA NA
RP4-536B24.2 ENSG00000260466 ENSG00000260466 NA NA NA
RP5-1056L3.3 ENSG00000226396 ENSG00000226396 NA NA NA
CHMP4BP1 ENSG00000258469 ENSG00000258469 charged multivesicular body protein 4B pseudogene 1 NA NA
DNAJB1 ENSG00000132002 3337 DnaJ heat shock protein family (Hsp40) member B1 This gene encodes a member of the DnaJ or Hsp40 (heat shock protein 40 kD) family of proteins. DNAJ family members are characterized by a highly conserved amino acid stretch called the ‘J-domain’ and function as one of the two major classes of molecular chaperones involved in a wide range of cellular events, such as protein folding and oligomeric protein complex assembly. The encoded protein is a molecular chaperone that stimulates the ATPase activity of Hsp70 heat-shock proteins in order to promote protein folding and prevent misfolded protein aggregation. Alternative splicing results in multiple transcript variants. NA
SBNO2 ENSG00000064932 22904 strawberry notch homolog 2 (Drosophila) NA NA
JUNB ENSG00000171223 3726 JunB proto-oncogene, AP-1 transcription factor subunit NA NA
AC005363.9 ENSG00000255513 ENSG00000255513 NA NA NA
RP11-888D10.4 ENSG00000273284 ENSG00000273284 NA NA NA
AC004471.9 ENSG00000223461 ENSG00000223461 NA NA NA
RFX2 ENSG00000087903 5990 regulatory factor X2 This gene is a member of the regulatory factor X gene family, which encodes transcription factors that contain a highly-conserved winged helix DNA binding domain. The protein encoded by this gene is structurally related to regulatory factors X1, X3, X4, and X5. It is a transcriptional activator that can bind DNA as a monomer or as a heterodimer with other RFX family members. This protein can bind to cis elements in the promoter of the IL-5 receptor alpha gene. Two transcript variants encoding different isoforms have been described for this gene, and both variants utilize alternative polyadenylation sites. NA
RP11-563J2.3 ENSG00000212743 ENSG00000212743 NA NA NA
RP11-324I22.3 ENSG00000269952 ENSG00000269952 NA NA NA
MYC ENSG00000136997 4609 v-myc avian myelocytomatosis viral oncogene homolog The protein encoded by this gene is a multifunctional, nuclear phosphoprotein that plays a role in cell cycle progression, apoptosis and cellular transformation. It functions as a transcription factor that regulates transcription of specific target genes. Mutations, overexpression, rearrangement and translocation of this gene have been associated with a variety of hematopoietic tumors, leukemias and lymphomas, including Burkitt lymphoma. There is evidence to show that alternative translation initiations from an upstream, in-frame non-AUG (CUG) and a downstream AUG start site result in the production of two isoforms with distinct N-termini. The synthesis of non-AUG initiated protein is suppressed in Burkitt’s lymphomas, suggesting its importance in the normal function of this gene. NA
SNORA7A ENSG00000207496 619563 small nucleolar RNA, H/ACA box 7A NA NA
AC073410.1 ENSG00000236047 ENSG00000236047 NA NA NA
AZU1 ENSG00000172232 566 azurocidin 1 Azurophil granules, specialized lysosomes of the neutrophil, contain at least 10 proteins implicated in the killing of microorganisms. This gene encodes a preproprotein that is proteolytically processed to generate a mature azurophil granule antibiotic protein, with monocyte chemotactic and antimicrobial activity. It is also an important multifunctional inflammatory mediator. This encoded protein is a member of the serine protease gene family but it is not a serine proteinase, because the active site serine and histidine residues are replaced. The genes encoding this protein, neutrophil elastase 2, and proteinase 3 are in a cluster located at chromosome 19pter. All 3 genes are expressed coordinately and their protein products are packaged together into azurophil granules during neutrophil differentiation. NA
NFIL3 ENSG00000165030 4783 nuclear factor, interleukin 3 regulated The protein encoded by this gene is a transcriptional regulator that binds as a homodimer to activating transcription factor (ATF) sites in many cellular and viral promoters. The encoded protein represses PER1 and PER2 expression and therefore plays a role in the regulation of circadian rhythm. Three transcript variants encoding the same protein have been found for this gene. NA
FBXW4P1 ENSG00000230701 26226 F-box and WD repeat domain containing 4 pseudogene 1 NA NA
PLAUR ENSG00000011422 5329 plasminogen activator, urokinase receptor This gene encodes the receptor for urokinase plasminogen activator and, given its role in localizing and promoting plasmin formation, likely influences many normal and pathological processes related to cell-surface plasminogen activation and localized degradation of the extracellular matrix. It binds both the proprotein and mature forms of urokinase plasminogen activator and permits the activation of the receptor-bound pro-enzyme by plasmin. The protein lacks transmembrane or cytoplasmic domains and may be anchored to the plasma membrane by a glycosyl-phosphatidylinositol (GPI) moiety following cleavage of the nascent polypeptide near its carboxy-terminus. However, a soluble protein is also produced in some cell types. Alternative splicing results in multiple transcript variants encoding different isoforms. The proprotein experiences several post-translational cleavage reactions that have not yet been fully defined. NA
ICAM1 ENSG00000090339 3383 intercellular adhesion molecule 1 This gene encodes a cell surface glycoprotein which is typically expressed on endothelial cells and cells of the immune system. It binds to integrins of type CD11a / CD18, or CD11b / CD18 and is also exploited by Rhinovirus as a receptor. NA
RP11-343H5.4 ENSG00000224114 ENSG00000224114 NA NA NA
UBE2R2-AS1 ENSG00000235481 ENSG00000235481 UBE2R2 antisense RNA 1 NA NA
RELB ENSG00000104856 5971 RELB proto-oncogene, NF-kB subunit NA NA
SNORA25 ENSG00000207112 684959 small nucleolar RNA, H/ACA box 25 NA NA
NA ENSG00000197697 NA NA NA TRUE
CTRL ENSG00000141086 1506 chymotrypsin like NA NA
RNF122 ENSG00000133874 79845 ring finger protein 122 The encoded protein contains a RING finger, a motif present in a variety of functionally distinct proteins and known to be involved in protein-protein and protein-DNA interactions. The encoded protein is localized to the endoplasmic reticulum and golgi apparatus, and may be associated with cell viability. NA
CTD-2369P2.8 ENSG00000267607 ENSG00000267607 NA NA NA
SLC11A1 ENSG00000018280 6556 solute carrier family 11 member 1 This gene is a member of the solute carrier family 11 (proton-coupled divalent metal ion transporters) family and encodes a multi-pass membrane protein. The protein functions as a divalent transition metal (iron and manganese) transporter involved in iron metabolism and host resistance to certain pathogens. Mutations in this gene have been associated with susceptibility to infectious diseases such as tuberculosis and leprosy, and inflammatory diseases such as rheumatoid arthritis and Crohn disease. Alternatively spliced variants that encode different protein isoforms have been described but the full-length nature of only one has been determined. NA
PIGHP1 ENSG00000259657 ENSG00000259657 phosphatidylinositol glycan anchor biosynthesis class H pseudogene 1 NA NA
SNORA64 ENSG00000207405 26784 small nucleolar RNA, H/ACA box 64 NA NA
TREM1 ENSG00000124731 54210 triggering receptor expressed on myeloid cells 1 This gene encodes a receptor belonging to the Ig superfamily that is expressed on myeloid cells. This protein amplifies neutrophil and monocyte-mediated inflammatory responses triggered by bacterial and fungal infections by stimulating release of pro-inflammatory chemokines and cytokines, as well as increased surface expression of cell activation markers. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. NA
BTG3 ENSG00000154640 10950 BTG family member 3 The protein encoded by this gene is a member of the BTG/Tob family. This family has structurally related proteins that appear to have antiproliferative properties. This encoded protein might play a role in neurogenesis in the central nervous system. Two transcript variants encoding different isoforms have been found for this gene. NA
NME2P1 ENSG00000123009 ENSG00000123009 NME/NM23 nucleoside diphosphate kinase 2 pseudogene 1 NA NA
TMEM217 ENSG00000172738 221468 transmembrane protein 217 NA NA
RP11-22N19.2 ENSG00000273320 ENSG00000273320 NA NA NA
SNORD10 ENSG00000238917 ENSG00000238917 small nucleolar RNA, C/D box 10 NA NA
NA ENSG00000179294 NA NA NA TRUE
RP11-727F15.13 ENSG00000269463 ENSG00000269463 NA NA NA
CXCL1 ENSG00000163739 2919 C-X-C motif chemokine ligand 1 This antimicrobial gene encodes a member of the CXC subfamily of chemokines. The encoded protein is a secreted growth factor that signals through the G-protein coupled receptor, CXC receptor 2. This protein plays a role in inflammation and as a chemoattractant for neutrophils. Aberrant expression of this protein is associated with the growth and progression of certain tumors. A naturally occurring processed form of this protein has increased chemotactic activity. Alternate splicing results in coding and non-coding variants of this gene. A pseudogene of this gene is found on chromosome 4. NA
RP13-638C3.3 ENSG00000262147 ENSG00000262147 NA NA NA
AP000593.7 ENSG00000255843 ENSG00000255843 NA NA NA
RP11-269F19.2 ENSG00000225721 ENSG00000225721 NA NA NA
TC2N ENSG00000165929 123036 tandem C2 domains, nuclear NA NA
AQP9 ENSG00000103569 366 aquaporin 9 The aquaporins are a family of water-selective membrane channels. This gene encodes a member of a subset of aquaporins called the aquaglyceroporins. This protein allows passage of a broad range of noncharged solutes and also stimulates urea transport and osmotic water permeability. This protein may also facilitate the uptake of glycerol in hepatic tissue . The encoded protein may also play a role in specialized leukocyte functions such as immunological response and bactericidal activity. Alternate splicing results in multiple transcript variants. NA
HIST2H2BF ENSG00000203814 440689 histone cluster 2, H2bf Histones are basic nuclear proteins that are responsible for the nucleosome structure of the chromosomal fiber in eukaryotes. This structure consists of approximately 146 bp of DNA wrapped around a nucleosome, an octamer composed of pairs of each of the four core histones (H2A, H2B, H3, and H4). The chromatin fiber is further compacted through the interaction of a linker histone, H1, with the DNA between the nucleosomes to form higher order chromatin structures. This gene encodes a replication-dependent histone that is a member of the histone H2B family and is found in a histone cluster on chromosome 1. NA
NA ENSG00000204807 NA NA NA TRUE
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",10,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 11 Annotations

out <- mygene::queryMany(gene_list[11,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name summary X_id query symbol notfound
glycoprotein 2 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. 2813 ENSG00000169347 GP2 NA
spexin hormone The protein encoded by this gene is a hormone involved in modulation of cardiovascular and renal function. It has also been shown in rats to cause weight loss. Several transcript variants have been found for this gene. 80763 ENSG00000134548 SPX NA
regenerating family member 1 alpha This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. 5967 ENSG00000115386 REG1A NA
protease, serine 1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. 5644 ENSG00000204983 PRSS1 NA
chymotrypsin like elastase family member 3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. 10136 ENSG00000142789 CELA3A NA
polymeric immunoglobulin receptor This gene is a member of the immunoglobulin superfamily. The encoded poly-Ig receptor binds polymeric immunoglobulin molecules at the basolateral surface of epithelial cells; the complex is then transported across the cell to be secreted at the apical surface. A significant association was found between immunoglobulin A nephropathy and several SNPs in this gene. 5284 ENSG00000162896 PIGR NA
pancreatic lipase This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. 5406 ENSG00000175535 PNLIP NA
phospholipase A2 group IB This gene encodes a secreted member of the phospholipase A2 (PLA2) class of enzymes, which is produced by the pancreatic acinar cells. The encoded calcium-dependent enzyme catalyzes the hydrolysis of the sn-2 position of membrane glycerophospholipids to release arachidonic acid (AA) and lysophospholipids. AA is subsequently converted by downstream metabolic enzymes to several bioactive lipophilic compounds (eicosanoids), including prostaglandins (PGs) and leukotrienes (LTs). The enzyme may be involved in several physiological processes including cell contraction, cell proliferation and pathological response. 5319 ENSG00000170890 PLA2G1B NA
chymotrypsin like elastase family member 3B Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. 23436 ENSG00000219073 CELA3B NA
fibrinogen beta chain The protein encoded by this gene is the beta component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including afibrinogenemia, dysfibrinogenemia, hypodysfibrinogenemia and thrombotic tendency. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 2244 ENSG00000171564 FGB NA
NA NA ENSG00000249790 ENSG00000249790 RP11-20D14.6 NA
carboxypeptidase A1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. 1357 ENSG00000091704 CPA1 NA
NA NA ENSG00000272030 ENSG00000272030 RP1-178F15.4 NA
colipase The protein encoded by this gene is a cofactor needed by pancreatic lipase for efficient dietary lipid hydrolysis. It binds to the C-terminal, non-catalytic domain of lipase, thereby stabilizing an active conformation and considerably increasing the overall hydrophobic binding site. The gene product allows lipase to anchor noncovalently to the surface of lipid micelles, counteracting the destabilizing influence of intestinal bile salts. This cofactor is only expressed in pancreatic acinar cells, suggesting regulation of expression by tissue-specific elements. Three transcript variants encoding different isoforms have been found for this gene. 1208 ENSG00000137392 CLPS NA
syncollin NA 342898 ENSG00000179751 SYCN NA
apolipoprotein C3 Apolipoprotein C-III is a very low density lipoprotein (VLDL) protein. APOC3 inhibits lipoprotein lipase and hepatic lipase; it is thought to delay catabolism of triglyceride-rich particles. The APOA1, APOC3 and APOA4 genes are closely linked in both rat and human genomes. The A-I and A-IV genes are transcribed from the same strand, while the A-1 and C-III genes are convergently transcribed. An increase in apoC-III levels induces the development of hypertriglyceridemia. 345 ENSG00000110245 APOC3 NA
spondin 2 NA 10417 ENSG00000159674 SPON2 NA
zymogen granule protein 16B NA 124220 ENSG00000162078 ZG16B NA
mucin 7, secreted This gene encodes a small salivary mucin, which is thought to play a role in facilitating the clearance of bacteria in the oral cavity and to aid in mastication, speech, and swallowing. The central domain of this glycoprotein contains tandem repeats, each composed of 23 amino acids. This antimicrobial protein has antibacterial and antifungal activity. The most common allele contains 6 repeats, and some alleles may be associated with susceptibility to asthma. Alternatively spliced transcript variants with different 5’ UTR, but encoding the same protein, have been found for this gene. 4589 ENSG00000171195 MUC7 NA
fibronectin 1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. 2335 ENSG00000115414 FN1 NA
NA NA NA ENSG00000184674 NA TRUE
hedgehog acyltransferase-like NA 57467 ENSG00000010282 HHATL NA
chymotrypsinogen B1 The protein encoded by this gene is one of a family of serine proteases that is secreted into the gastrointestinal tract as an inactive precursor, which is activated by proteolytic cleavage with trypsin. 1504 ENSG00000168925 CTRB1 NA
charged multivesicular body protein 4C CHMP4C belongs to the chromatin-modifying protein/charged multivesicular body protein (CHMP) family. These proteins are components of ESCRT-III (endosomal sorting complex required for transport III), a complex involved in degradation of surface receptor proteins and formation of endocytic multivesicular bodies (MVBs). Some CHMPs have both nuclear and cytoplasmic/vesicular distributions, and one such CHMP, CHMP1A (MIM 164010), is required for both MVB formation and regulation of cell cycle progression (Tsang et al., 2006 [PubMed 16730941]). 92421 ENSG00000164695 CHMP4C NA
neurogranin Neurogranin (NRGN) is the human homolog of the neuron-specific rat RC3/neurogranin gene. This gene encodes a postsynaptic protein kinase substrate that binds calmodulin in the absence of calcium. The NRGN gene contains four exons and three introns. The exons 1 and 2 encode the protein and exons 3 and 4 contain untranslated sequences. It is suggested that the NRGN is a direct target for thyroid hormone in human brain, and that control of expression of this gene could underlay many of the consequences of hypothyroidism on mental states during development as well as in adult subjects. 4900 ENSG00000154146 NRGN NA
NA NA NA ENSG00000250606 NA TRUE
pancreatic lipase related protein 1 NA 5407 ENSG00000187021 PNLIPRP1 NA
secreted frizzled related protein 4 Secreted frizzled-related protein 4 (SFRP4) is a member of the SFRP family that contains a cysteine-rich domain homologous to the putative Wnt-binding site of Frizzled proteins. SFRPs act as soluble modulators of Wnt signaling. The expression of SFRP4 in ventricular myocardium correlates with apoptosis related gene expression. 6424 ENSG00000106483 SFRP4 NA
nexilin F-actin binding protein This gene encodes a filamentous actin-binding protein that may function in cell adhesion and migration. Mutations in this gene have been associated with dilated cardiomyopathy, also known as CMD1CC. Alternatively spliced transcript variants have been described. 91624 ENSG00000162614 NEXN NA
chymotrypsinogen B2 NA 440387 ENSG00000168928 CTRB2 NA
chymotrypsin like elastase family member 2A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Like most of the human elastases, elastase 2A is secreted from the pancreas as a zymogen. In other species, elastase 2A has been shown to preferentially cleave proteins after leucine, methionine, and phenylalanine residues. 63036 ENSG00000142615 CELA2A NA
nebulette This gene encodes a nebulin like protein that is abundantly expressed in cardiac muscle. The encoded protein binds actin and interacts with thin filaments and Z-line associated proteins in striated muscle. This protein may be involved in cardiac myofibril assembly. A shorter isoform of this protein termed LIM nebulette is expressed in non-muscle cells and may function as a component of focal adhesion complexes. Alternate splicing results in multiple transcript variants. 10529 ENSG00000078114 NEBL NA
myosin light chain 1 Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain expressed in fast skeletal muscle. Two transcript variants have been identified for this gene. 4632 ENSG00000168530 MYL1 NA
collagen type V alpha 1 This gene encodes an alpha chain for one of the low abundance fibrillar collagens. Fibrillar collagen molecules are trimers that can be composed of one or more types of alpha chains. Type V collagen is found in tissues containing type I collagen and appears to regulate the assembly of heterotypic fibers composed of both type I and type V collagen. This gene product is closely related to type XI collagen and it is possible that the collagen chains of types V and XI constitute a single collagen type with tissue-specific chain combinations. The encoded procollagen protein occurs commonly as the heterotrimer pro-alpha1(V)-pro-alpha1(V)-pro-alpha2(V). Mutations in this gene are associated with Ehlers-Danlos syndrome, types I and II. Alternative splicing of this gene results in multiple transcript variants. 1289 ENSG00000130635 COL5A1 NA
NA NA ENSG00000273179 ENSG00000273179 RP11-20I20.4 NA
NA NA ENSG00000259279 ENSG00000259279 CTD-2033D15.1 NA
filamin C This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. 2318 ENSG00000128591 FLNC NA
NA NA ENSG00000268649 ENSG00000268649 RP4-806M20.4 NA
myosin, heavy chain 1, skeletal muscle, adult Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development. 4619 ENSG00000109061 MYH1 NA
collagen type I alpha 1 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1277 ENSG00000108821 COL1A1 NA
TM4SF19 antisense RNA 1 NA 100874214 ENSG00000235897 TM4SF19-AS1 NA
serine incorporator 2 NA 347735 ENSG00000168528 SERINC2 NA
collagen type VI alpha 1 The collagens are a superfamily of proteins that play a role in maintaining the integrity of various tissues. Collagens are extracellular matrix proteins and have a triple-helical domain as their common structural element. Collagen VI is a major structural component of microfibrils. The basic structural unit of collagen VI is a heterotrimer of the alpha1(VI), alpha2(VI), and alpha3(VI) chains. The alpha2(VI) and alpha3(VI) chains are encoded by the COL6A2 and COL6A3 genes, respectively. The protein encoded by this gene is the alpha 1 subunit of type VI collagen (alpha1(VI) chain). Mutations in the genes that code for the collagen VI subunits result in the autosomal dominant disorder, Bethlem myopathy. 1291 ENSG00000142156 COL6A1 NA
sarcoglycan alpha This gene encodes a component of the dystrophin-glycoprotein complex (DGC), which is critical to the stability of muscle fiber membranes and to the linking of the actin cytoskeleton to the extracellular matrix. Its expression is thought to be restricted to striated muscle. Mutations in this gene result in type 2D autosomal recessive limb-girdle muscular dystrophy. Multiple transcript variants encoding different isoforms have been found for this gene. 6442 ENSG00000108823 SGCA NA
NA NA ENSG00000212743 ENSG00000212743 RP11-563J2.3 NA
pepsinogen 3, group I (pepsinogen A) This gene encodes a protein precursor of the digestive enzyme pepsin, a member of the peptidase A1 family of endopeptidases. The encoded precursor is secreted by gastric chief cells and undergoes autocatalytic cleavage in acidic conditions to form the active enzyme, which functions in the digestion of dietary proteins. This gene is found in a cluster of related genes on chromosome 11, each of which encodes one of multiple pepsinogens. Pepsinogen levels in serum may serve as a biomarker for atrophic gastritis and gastric cancer. 643834 ENSG00000229859 PGA3 NA
high mobility group box 1 pseudogene 3 NA ENSG00000250011 ENSG00000250011 HMGB1P3 NA
tetraspanin 1 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. 10103 ENSG00000117472 TSPAN1 NA
fibrinogen alpha chain This gene encodes the alpha subunit of the coagulation factor fibrinogen, which is a component of the blood clot. Following vascular injury, the encoded preproprotein is proteolytically processed by thrombin during the conversion of fibrinogen to fibrin. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia, afibrinogenemia and renal amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. 2243 ENSG00000171560 FGA NA
orosomucoid 1 This gene encodes a key acute phase plasma protein. Because of its increase due to acute inflammation, this protein is classified as an acute-phase reactant. The specific function of this protein has not yet been determined; however, it may be involved in aspects of immunosuppression. 5004 ENSG00000229314 ORM1 NA
NA NA NA ENSG00000272403 NA TRUE
NA NA NA ENSG00000197262 NA TRUE
sphingomyelin phosphodiesterase acid like 3A NA 10924 ENSG00000172594 SMPDL3A NA
C-C motif chemokine ligand 21 This antimicrobial gene is one of several CC cytokine genes clustered on the p-arm of chromosome 9. Cytokines are a family of secreted proteins involved in immunoregulatory and inflammatory processes. The CC cytokines are proteins characterized by two adjacent cysteines. Similar to other chemokines the protein encoded by this gene inhibits hemopoiesis and stimulates chemotaxis. This protein is chemotactic in vitro for thymocytes and activated T cells, but not for B cells, macrophages, or neutrophils. The cytokine encoded by this gene may also play a role in mediating homing of lymphocytes to secondary lymphoid organs. It is a high affinity functional ligand for chemokine receptor 7 that is expressed on T and B lymphocytes and a known receptor for another member of the cytokine family (small inducible cytokine A19). 6366 ENSG00000137077 CCL21 NA
myosin, heavy chain 7B, cardiac muscle, beta The myosin II molecule is a multi-subunit complex consisting of two heavy chains and four light chains. This gene encodes a heavy chain of myosin II, which is a member of the motor-domain superfamily. The heavy chain includes a globular motor domain, which catalyzes ATP hydrolysis and interacts with actin, and a tail domain in which heptad repeat sequences promote dimerization by interacting to form a rod-like alpha-helical coiled coil. This heavy chain subunit is a slow-twitch myosin. Alternatively spliced transcript variants have been found, but the full-length nature of these variants is not determined. 57644 ENSG00000078814 MYH7B NA
collagen type VI alpha 2 This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. 1292 ENSG00000142173 COL6A2 NA
prostaglandin-endoperoxide synthase 1 This is one of two genes encoding similar enzymes that catalyze the conversion of arachinodate to prostaglandin. The encoded protein regulates angiogenesis in endothelial cells, and is inhibited by nonsteroidal anti-inflammatory drugs such as aspirin. Based on its ability to function as both a cyclooxygenase and as a peroxidase, the encoded protein has been identified as a moonlighting protein. The protein may promote cell proliferation during tumor progression. Alternative splicing results in multiple transcript variants. 5742 ENSG00000095303 PTGS1 NA
heparan sulfate-glucosamine 3-sulfotransferase 3B1 The protein encoded by this gene is a type II integral membrane protein that belongs to the 3-O-sulfotransferases family. These proteins catalyze the addition of sulfate groups at the 3-OH position of glucosamine in heparan sulfate. The substrate specificity of individual members of the family is based on prior modification of the heparan sulfate chain, thus allowing different members of the family to generate binding sites for different proteins on the same heparan sulfate chain. Following treatment with a histone deacetylase inhibitor, expression of this gene is activated in a pancreatic cell line. The increased expression results in promotion of the epithelial-mesenchymal transition. In addition, the modification catalyzed by this protein allows herpes simplex virus membrane fusion and penetration. A very closely related homolog with an almost identical sulfotransferase domain maps less than 1 Mb away. Alternative splicing results in multiple transcript variants. 9953 ENSG00000125430 HS3ST3B1 NA
spectrin beta, non-erythrocytic 2 Spectrins are principle components of a cell’s membrane-cytoskeleton and are composed of two alpha and two beta spectrin subunits. The protein encoded by this gene (SPTBN2), is called spectrin beta non-erythrocytic 2 or beta-III spectrin. It is related to, but distinct from, the beta-II spectrin gene which is also known as spectrin beta non-erythrocytic 1 (SPTBN1). SPTBN2 regulates the glutamate signaling pathway by stabilizing the glutamate transporter EAAT4 at the surface of the plasma membrane. Mutations in this gene cause a form of spinocerebellar ataxia, SCA5, that is characterized by neurodegeneration, progressive locomotor incoordination, dysarthria, and uncoordinated eye movements. 6712 ENSG00000173898 SPTBN2 NA
copine 5 Calcium-dependent membrane-binding proteins may regulate molecular events at the interface of the cell membrane and cytoplasm. This gene is one of several genes that encode a calcium-dependent protein containing two N-terminal type II C2 domains and an integrin A domain-like sequence in the C-terminus. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene. More variants may exist, but their full-length natures could not be determined. 57699 ENSG00000124772 CPNE5 NA
lysyl oxidase like 2 This gene encodes a member of the lysyl oxidase gene family. The prototypic member of the family is essential to the biogenesis of connective tissue, encoding an extracellular copper-dependent amine oxidase that catalyses the first step in the formation of crosslinks in collagens and elastin. A highly conserved amino acid sequence at the C-terminus end appears to be sufficient for amine oxidase activity, suggesting that each family member may retain this function. The N-terminus is poorly conserved and may impart additional roles in developmental regulation, senescence, tumor suppression, cell growth control, and chemotaxis to each member of the family. 4017 ENSG00000134013 LOXL2 NA
smoothelin like 1 SMTNL1, which is a member of the smoothelin (SMTN; MIM 602127) family, regulates contraction and relaxation of skeletal and smooth muscle fibers and mediates vascular adaptation to exercise (Wooldridge et al., 2008 [PubMed 18310078]). 219537 ENSG00000214872 SMTNL1 NA
thrombospondin 1 The protein encoded by this gene is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein can bind to fibrinogen, fibronectin, laminin, type V collagen and integrins alpha-V/beta-1. This protein has been shown to play roles in platelet aggregation, angiogenesis, and tumorigenesis. 7057 ENSG00000137801 THBS1 NA
NA NA ENSG00000258376 ENSG00000258376 RP4-647C14.2 NA
natriuretic peptide receptor 1 Guanylyl cyclases, catalyzing the production of cGMP from GTP, are classified as soluble and membrane forms (Garbers and Lowe, 1994 [PubMed 7982997]). The membrane guanylyl cyclases, often termed guanylyl cyclases A through F, form a family of cell-surface receptors with a similar topographic structure: an extracellular ligand-binding domain, a single membrane-spanning domain, and an intracellular region that contains a protein kinase-like domain and a cyclase catalytic domain. GC-A and GC-B function as receptors for natriuretic peptides; they are also referred to as atrial natriuretic peptide receptor A (NPR1) and type B (NPR2; MIM 108961). Also see NPR3 (MIM 108962), which encodes a protein with only the ligand-binding transmembrane and 37-amino acid cytoplasmic domains. NPR1 is a membrane-bound guanylate cyclase that serves as the receptor for both atrial and brain natriuretic peptides (ANP (MIM 108780) and BNP (MIM 600295), respectively). 4881 ENSG00000169418 NPR1 NA
collagen type IX alpha 3 This gene encodes one of the three alpha chains of type IX collagen, the major collagen component of hyaline cartilage. Type IX collagen, a heterotrimeric molecule, is usually found in tissues containing type II collagen, a fibrillar collagen. Mutations in this gene are associated with multiple epiphyseal dysplasia type 3. 1299 ENSG00000092758 COL9A3 NA
NA NA ENSG00000254680 ENSG00000254680 RP11-265D17.2 NA
scavenger receptor cysteine rich family member with 5 domains NA 284297 ENSG00000179954 SSC5D NA
carboxypeptidase B1 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. 1360 ENSG00000153002 CPB1 NA
matrix metallopeptidase 2 This gene is a member of the matrix metalloproteinase (MMP) gene family, that are zinc-dependent enzymes capable of cleaving components of the extracellular matrix and molecules involved in signal transduction. The protein encoded by this gene is a gelatinase A, type IV collagenase, that contains three fibronectin type II repeats in its catalytic site that allow binding of denatured type IV and V collagen and elastin. Unlike most MMP family members, activation of this protein can occur on the cell membrane. This enzyme can be activated extracellularly by proteases, or, intracellulary by its S-glutathiolation with no requirement for proteolytical removal of the pro-domain. This protein is thought to be involved in multiple pathways including roles in the nervous system, endometrial menstrual breakdown, regulation of vascularization, and metastasis. Mutations in this gene have been associated with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. 4313 ENSG00000087245 MMP2 NA
solute carrier family 7 member 11 This gene encodes a member of a heteromeric, sodium-independent, anionic amino acid transport system that is highly specific for cysteine and glutamate. In this system, designated Xc(-), the anionic form of cysteine is transported in exchange for glutamate. This protein has been identified as the predominant mediator of Kaposi sarcoma-associated herpesvirus fusion and entry permissiveness into cells. Also, increased expression of this gene in primary gliomas (compared to normal brain tissue) was associated with increased glutamate secretion via the XCT channels, resulting in neuronal cell death. 23657 ENSG00000151012 SLC7A11 NA
NA NA ENSG00000264272 ENSG00000264272 CTD-2514K5.4 NA
radial spoke head 1 homolog This gene encodes a male meiotic metaphase chromosome-associated acidic protein. This gene is expressed in tissues with motile cilia or flagella, including the trachea, lungs, airway brushings, and testes. Mutations in this gene result in primary ciliary dyskinesia-24. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 89765 ENSG00000160188 RSPH1 NA
surfactant protein A2 This gene is one of several genes encoding pulmonary-surfactant associated proteins (SFTPA) located on chromosome 10. Mutations in this gene and a highly similar gene located nearby, which affect the highly conserved carbohydrate recognition domain, are associated with idiopathic pulmonary fibrosis. The current version of the assembly displays only a single centromeric SFTPA gene pair rather than the two gene pairs shown in the previous assembly which were thought to have resulted from a duplication. 729238 ENSG00000185303 SFTPA2 NA
cerebellin 3 precursor Members of the precerebellin family, such as CBLN3, contain a cerebellin motif (see CBLN1; MIM 600432) and a C-terminal C1q signature domain (see MIM 120550) that mediates trimeric assembly of atypical collagen complexes. However, precerebellins do not contain a collagen motif, suggesting that they are not conventional components of the extracellular matrix (Pang et al., 2000 [PubMed 10964938]). 643866 ENSG00000139899 CBLN3 NA
ribosomal protein L36 pseudogene 4 NA ENSG00000224497 ENSG00000224497 RPL36P4 NA
ubiquitin associated and SH3 domain containing B This gene encodes a protein that contains a ubiquitin associated domain at the N-terminus, an SH3 domain, and a C-terminal domain with similarities to the catalytic motif of phosphoglycerate mutase. The encoded protein was found to inhibit endocytosis of epidermal growth factor receptor (EGFR) and platelet-derived growth factor receptor. 84959 ENSG00000154127 UBASH3B NA
NA NA ENSG00000272512 ENSG00000272512 RP11-54O7.17 NA
hypoxia inducible lipid droplet associated NA 29923 ENSG00000135245 HILPDA NA
galectin 4 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. The expression of this gene is restricted to small intestine, colon, and rectum, and it is underexpressed in colorectal cancer. 3960 ENSG00000171747 LGALS4 NA
chromosome 8 open reading frame 88 NA 100127983 ENSG00000253250 C8orf88 NA
polypeptide N-acetylgalactosaminyltransferase 12 This gene encodes a member of a family of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferases, which catalyze the transfer of N-acetylgalactosamine (GalNAc) from UDP-GalNAc to a serine or threonine residue on a polypeptide acceptor in the initial step of O-linked protein glycosylation. Mutations in this gene are associated with an increased susceptibility to colorectal cancer. 79695 ENSG00000119514 GALNT12 NA
lymphocyte activating 3 Lymphocyte-activation protein 3 belongs to Ig superfamily and contains 4 extracellular Ig-like domains. The LAG3 gene contains 8 exons. The sequence data, exon/intron organization, and chromosomal localization all indicate a close relationship of LAG3 to CD4. 3902 ENSG00000089692 LAG3 NA
collagen type I alpha 2 chain This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1278 ENSG00000164692 COL1A2 NA
integrin subunit alpha 5 The product of this gene belongs to the integrin alpha chain family. Integrins are heterodimeric integral membrane proteins composed of an alpha subunit and a beta subunit that function in cell surface adhesion and signaling. The encoded preproprotein is proteolytically processed to generate light and heavy chains that comprise the alpha 5 subunit. This subunit associates with the beta 1 subunit to form a fibronectin receptor. This integrin may promote tumor invasion, and higher expression of this gene may be correlated with shorter survival time in lung cancer patients. Note that the integrin alpha 5 and integrin alpha V subunits are encoded by distinct genes. 3678 ENSG00000161638 ITGA5 NA
tissue differentiation-inducing non-protein coding RNA This gene produces a spliced long non-coding RNA that is required for normal epidermal differentiation. This transcript regulates the expression of genes involved in the differentiation of epidermal tissue. Mutations in some of the genes targeted by this transcript have been implicated in epidermal skin diseases. 257000 ENSG00000223573 TINCR NA
lipase F, gastric type This gene encodes gastric lipase, an enzyme involved in the digestion of dietary triglycerides in the gastrointestinal tract, and responsible for 30% of fat digestion processes occurring in human. It is secreted by gastric chief cells in the fundic mucosa of the stomach, and it hydrolyzes the ester bonds of triglycerides under acidic pH conditions. The gene is a member of a conserved gene family of lipases that play distinct roles in neutral lipid metabolism. Several transcript variants encoding different isoforms have been found for this gene. 8513 ENSG00000182333 LIPF NA
apolipoprotein B mRNA editing enzyme catalytic subunit 3F This gene is a member of the cytidine deaminase gene family. It is one of seven related genes or pseudogenes found in a cluster, thought to result from gene duplication, on chromosome 22. Members of the cluster encode proteins that are structurally and functionally related to the C to U RNA-editing cytidine deaminase APOBEC1. It is thought that the proteins may be RNA editing enzymes and have roles in growth or cell cycle control. Alternatively spliced transcript variants encoding different isoforms have been identified. 200316 ENSG00000128394 APOBEC3F NA
AHNAK nucleoprotein 2 NA 113146 ENSG00000185567 AHNAK2 NA
hydroxysteroid 11-beta dehydrogenase 1 The protein encoded by this gene is a microsomal enzyme that catalyzes the conversion of the stress hormone cortisol to the inactive metabolite cortisone. In addition, the encoded protein can catalyze the reverse reaction, the conversion of cortisone to cortisol. Too much cortisol can lead to central obesity, and a particular variation in this gene has been associated with obesity and insulin resistance in children. Mutations in this gene and H6PD (hexose-6-phosphate dehydrogenase (glucose 1-dehydrogenase)) are the cause of cortisone reductase deficiency. Alternate splicing results in multiple transcript variants encoding the same protein. 3290 ENSG00000117594 HSD11B1 NA
PCOLCE antisense RNA 1 NA 100129845 ENSG00000224729 PCOLCE-AS1 NA
metallothionein 3 NA 4504 ENSG00000087250 MT3 NA
hemopexin This gene encodes a plasma glycoprotein that binds heme with high affinity. The encoded protein is an acute phase protein that transports heme from the plasma to the liver and may be involved in protecting cells from oxidative stress. 3263 ENSG00000110169 HPX NA
egl-9 family hypoxia inducible factor 3 NA 112399 ENSG00000129521 EGLN3 NA
elastin microfibril interfacer 1 This gene encodes an extracellular matrix glycoprotein that is characterized by an N-terminal microfibril interface domain, a coiled-coiled alpha-helical domain, a collagenous domain and a C-terminal globular C1q domain. The encoded protein associates with elastic fibers at the interface between elastin and microfibrils and may play a role in the development of elastic tissues including large blood vessels, dermis, heart and lung. 11117 ENSG00000138080 EMILIN1 NA
DLGAP1 antisense RNA 1 NA ENSG00000177337 ENSG00000177337 DLGAP1-AS1 NA
prolactin induced protein NA 5304 ENSG00000159763 PIP NA
glutathione S-transferase mu 1 Cytosolic and membrane-bound forms of glutathione S-transferase are encoded by two distinct supergene families. At present, eight distinct classes of the soluble cytoplasmic mammalian glutathione S-transferases have been identified: alpha, kappa, mu, omega, pi, sigma, theta and zeta. This gene encodes a glutathione S-transferase that belongs to the mu class. The mu class of enzymes functions in the detoxification of electrophilic compounds, including carcinogens, therapeutic drugs, environmental toxins and products of oxidative stress, by conjugation with glutathione. The genes encoding the mu class of enzymes are organized in a gene cluster on chromosome 1p13.3 and are known to be highly polymorphic. These genetic variations can change an individual’s susceptibility to carcinogens and toxins as well as affect the toxicity and efficacy of certain drugs. Null mutations of this class mu gene have been linked with an increase in a number of cancers, likely due to an increased susceptibility to environmental toxins and carcinogens. Multiple protein isoforms are encoded by transcript variants of this gene. 2944 ENSG00000134184 GSTM1 NA
neurexophilin 3 NA 11248 ENSG00000182575 NXPH3 NA
NA NA ENSG00000257433 ENSG00000257433 RP1-197B17.3 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",11,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 12 Annotations

out <- mygene::queryMany(gene_list[12,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id summary name symbol query notfound
1488 This gene produces alternative transcripts encoding two distinct proteins. One protein is a transcriptional repressor, while the other isoform is a major component of specialized synapses known as synaptic ribbons. Both proteins contain a NAD+ binding domain similar to NAD+-dependent 2-hydroxyacid dehydrogenases. A portion of the 3’ untranslated region was used to map this gene to chromosome 21q21.3; however, it was noted that similar loci elsewhere in the genome are likely. Blast analysis shows that this gene is present on chromosome 10. Several transcript variants encoding two different isoforms have been found for this gene. C-terminal binding protein 2 CTBP2 ENSG00000175029 NA
5319 This gene encodes a secreted member of the phospholipase A2 (PLA2) class of enzymes, which is produced by the pancreatic acinar cells. The encoded calcium-dependent enzyme catalyzes the hydrolysis of the sn-2 position of membrane glycerophospholipids to release arachidonic acid (AA) and lysophospholipids. AA is subsequently converted by downstream metabolic enzymes to several bioactive lipophilic compounds (eicosanoids), including prostaglandins (PGs) and leukotrienes (LTs). The enzyme may be involved in several physiological processes including cell contraction, cell proliferation and pathological response. phospholipase A2 group IB PLA2G1B ENSG00000170890 NA
8660 This gene encodes the insulin receptor substrate 2, a cytoplasmic signaling molecule that mediates effects of insulin, insulin-like growth factor 1, and other cytokines by acting as a molecular adaptor between diverse receptor tyrosine kinases and downstream effectors. The product of this gene is phosphorylated by the insulin receptor tyrosine kinase upon receptor stimulation, as well as by an interleukin 4 receptor-associated kinase in response to IL4 treatment. insulin receptor substrate 2 IRS2 ENSG00000185950 NA
100129550 NA uncharacterized LOC100129550 LOC100129550 ENSG00000273033 NA
51635 This gene encodes a member of the short-chain dehydrogenases/reductases (SDR) family, which has over 46,000 members. Members in this family are enzymes that metabolize many different compounds, such as steroid hormones, prostaglandins, retinoids, lipids and xenobiotics. dehydrogenase/reductase 7 DHRS7 ENSG00000100612 NA
51621 KLF13 belongs to a family of transcription factors that contain 3 classical zinc finger DNA-binding domains consisting of a zinc atom tetrahedrally coordinated by 2 cysteines and 2 histidines (C2H2 motif). These transcription factors bind to GC-rich sequences and related GT and CACCC boxes (Scohy et al., 2000 [PubMed 11087666]). Kruppel like factor 13 KLF13 ENSG00000169926 NA
64759 NA tensin 3 TNS3 ENSG00000136205 NA
11060 This gene encodes a member of the Nedd4 family of E3 ligases, which play an important role in protein ubiquitination. The encoded protein contains four WW domains and may play a role in multiple processes including chondrogenesis and the regulation of oncogenic signaling pathways via interactions with Smad proteins and the tumor suppressor PTEN. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene, and a pseudogene of this gene is located on the long arm of chromosome 10. WW domain containing E3 ubiquitin protein ligase 2 WWP2 ENSG00000198373 NA
80762 The protein encoded by this gene belongs to a small group of evolutionarily conserved proteins with three transmembrane domains. It is a potential target for ubiquitination by the Nedd4 family of proteins. This protein is thought to be part of a family of integral Golgi membrane proteins. Nedd4 family interacting protein 1 NDFIP1 ENSG00000131507 NA
6304 This gene encodes a matrix protein which binds nuclear matrix and scaffold-associating DNAs through a unique nuclear architecture. The protein recruits chromatin-remodeling factors in order to regulate chromatin structure and gene expression. SATB homeobox 1 SATB1 ENSG00000182568 NA
6809 The gene is a member of the syntaxin family. The encoded protein is targeted to the apical membrane of epithelial cells where it forms clusters and is important in establishing and maintaining polarity necessary for protein trafficking involving vesicle fusion and exocytosis. Alternative splicing results in multiple transcript variants. syntaxin 3 STX3 ENSG00000166900 NA
57515 NA serine incorporator 1 SERINC1 ENSG00000111897 NA
ENSG00000235027 NA NA AC068580.6 ENSG00000235027 NA
54893 NA myotubularin related protein 10 MTMR10 ENSG00000166912 NA
10618 This gene encodes a type I integral membrane protein that is localized to the trans-Golgi network, a major sorting station for secretory and membrane proteins. The encoded protein cycles between early endosomes and the trans-Golgi network, and may play a role in exocytic vesicle formation. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. trans-golgi network protein 2 TGOLN2 ENSG00000152291 NA
65010 This gene belongs to the solute carrier 26 family, whose members encode anion transporter proteins. This particular family member encodes a protein involved in transporting chloride, oxalate, sulfate and bicarbonate. Alternatively spliced transcript variants encoding distinct isoforms have been described. solute carrier family 26 member 6 SLC26A6 ENSG00000225697 NA
ENSG00000225313 NA NA RP11-415J8.3 ENSG00000225313 NA
2776 This locus encodes a guanine nucleotide-binding protein. The encoded protein, an alpha subunit in the Gq class, couples a seven-transmembrane domain receptor to activation of phospolipase C-beta. Mutations at this locus have been associated with problems in platelet activation and aggregation. A related pseudogene exists on chromosome 2. G protein subunit alpha q GNAQ ENSG00000156052 NA
334 This gene encodes amyloid precursor- like protein 2 (APLP2), which is a member of the APP (amyloid precursor protein) family including APP, APLP1 and APLP2. This protein is ubiquitously expressed. It contains heparin-, copper- and zinc- binding domains at the N-terminus, BPTI/Kunitz inhibitor and E2 domains in the middle region, and transmembrane and intracellular domains at the C-terminus. This protein interacts with major histocompatibility complex (MHC) class I molecules. The synergy of this protein and the APP is required to mediate neuromuscular transmission, spatial learning and synaptic plasticity. This protein has been implicated in the pathogenesis of Alzheimer’s disease. Multiple alternatively spliced transcript variants encoding different isoforms have been identified. amyloid beta precursor like protein 2 APLP2 ENSG00000084234 NA
80344 This gene encodes a WD repeat-containing protein that interacts with the COP9 signalosome, a macromolecular complex that interacts with cullin-RING E3 ligases and regulates their activity by hydrolyzing cullin-Nedd8 conjugates. Multiple alternatively spliced transcript variants have been found for this gene. DDB1 and CUL4 associated factor 11 DCAF11 ENSG00000100897 NA
2289 The protein encoded by this gene is a member of the immunophilin protein family, which play a role in immunoregulation and basic cellular processes involving protein folding and trafficking. This encoded protein is a cis-trans prolyl isomerase that binds to the immunosuppressants FK506 and rapamycin. It is thought to mediate calcineurin inhibition. It also interacts functionally with mature hetero-oligomeric progesterone receptor complexes along with the 90 kDa heat shock protein and P23 protein. This gene has been found to have multiple polyadenylation sites. Alternative splicing results in multiple transcript variants. FK506 binding protein 5 FKBP5 ENSG00000096060 NA
64855 NA family with sequence similarity 129 member B FAM129B ENSG00000136830 NA
ENSG00000271643 NA NA RP11-10C24.3 ENSG00000271643 NA
100505635 NA uncharacterized LOC100505635 LOC100505635 ENSG00000235033 NA
3556 Interleukin 1 induces synthesis of acute phase and proinflammatory proteins during infection, tissue damage, or stress, by forming a complex at the cell membrane with an interleukin 1 receptor and an accessory protein. This gene encodes the interleukin 1 receptor accessory protein. The protein is a necessary part of the interleukin 1 receptor complex which initiates signalling events that result in the activation of interleukin 1-responsive genes. Alternative splicing of this gene results in two transcript variants encoding two different isoforms, one membrane-bound and one soluble. The ratio of soluble to membrane-bound forms increases during acute-phase induction or stress. interleukin 1 receptor accessory protein IL1RAP ENSG00000196083 NA
NA NA NA NA ENSG00000255813 TRUE
ENSG00000271862 NA NA RP11-343L5.2 ENSG00000271862 NA
150967 NA DKFZp434H1419 PKI55 ENSG00000260804 NA
9788 NA metastasis suppressor 1 MTSS1 ENSG00000170873 NA
ENSG00000257715 NA NA RP11-256L6.2 ENSG00000257715 NA
1509 This gene encodes a member of the A1 family of peptidases. The encoded preproprotein is proteolytically processed to generate multiple protein products. These products include the cathepsin D light and heavy chains, which heterodimerize to form the mature enzyme. This enzyme exhibits pepsin-like activity and plays a role in protein turnover and in the proteolytic activation of hormones and growth factors. Mutations in this gene play a causal role in neuronal ceroid lipofuscinosis-10 and may be involved in the pathogenesis of several other diseases, including breast cancer and possibly Alzheimer’s disease. cathepsin D CTSD ENSG00000117984 NA
ENSG00000242960 NA ferritin, heavy polypeptide 1 pseudogene 23 FTH1P23 ENSG00000242960 NA
84925 This gene encodes a membrane-bound protein from the major facilitator superfamily of transporters. Disruption of this gene by translocation has been associated with haplo-insufficiency and renal cell carcinomas. Alternatively spliced transcript variants have been described, but their biological validity has not yet been determined. disrupted in renal carcinoma 2 DIRC2 ENSG00000138463 NA
ENSG00000255670 NA NA RP11-253I19.3 ENSG00000255670 NA
54918 This gene belongs to the chemokine-like factor gene superfamily, a novel family that is similar to the chemokine and transmembrane 4 superfamilies. This gene is one of several chemokine-like factor genes located in a cluster on chromosome 3. This gene is widely expressed in many tissues, but the exact function of the encoded protein is unknown. CKLF like MARVEL transmembrane domain containing 6 CMTM6 ENSG00000091317 NA
5339 Plectin is a prominent member of an important family of structurally and in part functionally related proteins, termed plakins or cytolinkers, that are capable of interlinking different elements of the cytoskeleton. Plakins, with their multi-domain structure and enormous size, not only play crucial roles in maintaining cell and tissue integrity and orchestrating dynamic changes in cytoarchitecture and cell shape, but also serve as scaffolding platforms for the assembly, positioning, and regulation of signaling complexes (reviewed in PMID: 9701547, 11854008, and 17499243). Plectin is expressed as several protein isoforms in a wide range of cell types and tissues from a single gene located on chromosome 8 in humans (PMID: 8633055, 8698233). Until 2010, this locus was named plectin 1 (symbol PLEC1 in human; Plec1 in mouse and rat) and the gene product had been referred to as ‘hemidesmosomal protein 1’ or ‘plectin 1, intermediate filament binding 500kDa’. These names were superseded by plectin. The plectin gene locus in mouse on chromosome 15 has been analyzed in detail (PMID: 10556294, 14559777), revealing a genomic exon-intron organization with well over 40 exons spanning over 62 kb and an unusual 5’ transcript complexity of plectin isoforms. Eleven exons (1-1j) have been identified that alternatively splice directly into a common exon 2 which is the first exon to encode plectin’s highly conserved actin binding domain (ABD). Three additional exons (-1, 0a, and 0) splice into an alternative first coding exon (1c), and two additional exons (2alpha and 3alpha) are optionally spliced within the exons encoding the acting binding domain (exons 2-8). Analysis of the human locus has identified eight of the eleven alternative 5’ exons found in mouse and rat (PMID: 14672974); exons 1i, 1j and 1h have not been confirmed in human. Furthermore, isoforms lacking the central rod domain encoded by exon 31 have been detected in mouse (PMID:10556294), rat (PMID: 9177781), and human (PMID: 11441066, 10780662, 20052759). The short alternative amino-terminal sequences encoded by the different first exons direct the targeting of the various isoforms to distinct subcellular locations (PMID: 14559777). As the expression of specific plectin isoforms was found to be dependent on cell type (tissue) and stage of development (PMID: 10556294, 12542521, 17389230) it appears that each cell type (tissue) contains a unique set (proportion and composition) of plectin isoforms, as if custom-made for specific requirements of the particular cells. Concordantly, individual isoforms were found to carry out distinct and specific functions (PMID: 14559777, 12542521, 18541706). In 1996, a number of groups reported that patients suffering from epidermolysis bullosa simplex with muscular dystrophy (EBS-MD) lacked plectin expression in skin and muscle tissues due to defects in the plectin gene (PMID: 8698233, 8941634, 8636409, 8894687, 8696340). Two other subtypes of plectin-related EBS have been described: EBS-pyloric atresia (PA) and EBS-Ogna. For reviews of plectin-related diseases see PMID: 15810881, 19945614. Mutations in the plectin gene related to human diseases should be named based on the position in NM_000445 (variant 1, isoform 1c), unless the mutation is located within one of the other alternative first exons, in which case the position in the respective Reference Sequence should be used. plectin PLEC ENSG00000178209 NA
3732 This metastasis suppressor gene product is a membrane glycoprotein that is a member of the transmembrane 4 superfamily. Expression of this gene has been shown to be downregulated in tumor progression of human cancers and can be activated by p53 through a consensus binding sequence in the promoter. Its expression and that of p53 are strongly correlated, and the loss of expression of these two proteins is associated with poor survival for prostate cancer patients. Two alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. CD82 molecule CD82 ENSG00000085117 NA
90 Activins are dimeric growth and differentiation factors which belong to the transforming growth factor-beta (TGF-beta) superfamily of structurally related signaling proteins. Activins signal through a heteromeric complex of receptor serine kinases which include at least two type I ( I and IB) and two type II (II and IIB) receptors. These receptors are all transmembrane proteins, composed of a ligand-binding extracellular domain with cysteine-rich region, a transmembrane domain, and a cytoplasmic domain with predicted serine/threonine specificity. Type I receptors are essential for signaling; and type II receptors are required for binding ligands and for expression of type I receptors. Type I and II receptors form a stable complex after ligand binding, resulting in phosphorylation of type I receptors by type II receptors. This gene encodes activin A type I receptor which signals a particular transcriptional response in concert with activin type II receptors. Mutations in this gene are associated with fibrodysplasia ossificans progressive. activin A receptor type 1 ACVR1 ENSG00000115170 NA
NA NA NA NA ENSG00000256845 TRUE
137886 NA UBX domain protein 2B UBXN2B ENSG00000215114 NA
206358 This gene encodes a member of the eukaryote-specific amino acid/auxin permease (AAAP) 1 transporter family. The encoded protein functions as a proton-dependent, small amino acid transporter. This gene is clustered with related family members on chromosome 5q33.1. Alternative splicing results in multiple transcript variants. solute carrier family 36 member 1 SLC36A1 ENSG00000123643 NA
57506 This gene encodes an intermediary protein necessary in the virus-triggered beta interferon signaling pathways. It is required for activation of transcription factors which regulate expression of beta interferon and contributes to antiviral immunity. Multiple transcript variants encoding different isoforms have been found for this gene. mitochondrial antiviral signaling protein MAVS ENSG00000088888 NA
9802 This gene encodes a proline-rich protein which interacts with the deleted in azoospermia (DAZ) and the deleted in azoospermia-like gene through the DAZ-like repeats. This protein also interacts with the transforming growth factor-beta signaling molecule SARA (Smad anchor for receptor activation), eukaryotic initiation factor 4G, and an E3 ubiquitinase that regulates its stability in splicing factor containing nuclear speckles. The encoded protein may function in various biological and pathological processes including spermatogenesis, cell signaling and transcription regulation, formation of stress granules during translation arrest, RNA splicing, and pathogenesis of multiple myeloma. Multiple transcript variants encoding different isoforms have been found for this gene. DAZ associated protein 2 DAZAP2 ENSG00000183283 NA
ENSG00000256448 NA NA RP11-809N8.4 ENSG00000256448 NA
ENSG00000255680 NA NA RP11-732A19.9 ENSG00000255680 NA
4942 This gene encodes the mitochondrial enzyme ornithine aminotransferase, which is a key enzyme in the pathway that converts arginine and ornithine into the major excitatory and inhibitory neurotransmitters glutamate and GABA. Mutations that result in a deficiency of this enzyme cause the autosomal recessive eye disease Gyrate Atrophy. Alternatively spliced transcript variants encoding different isoforms have been described. Related pseudogenes have been defined on the X chromosome. ornithine aminotransferase OAT ENSG00000065154 NA
9375 This gene encodes a member of the transmembrane 9 superfamily. The encoded 76 kDa protein localizes to early endosomes in human cells. The encoded protein possesses a conserved and highly hydrophobic C-terminal domain which contains nine transmembrane domains. The protein may play a role in small molecule transport or act as an ion channel. A pseudogene associated with this gene is located on the X chromosome. transmembrane 9 superfamily member 2 TM9SF2 ENSG00000125304 NA
9637 This gene is an ortholog of the C. elegans unc-76 gene, which is necessary for normal axonal bundling and elongation within axon bundles. Other orthologs include the rat gene that encodes zygin II, which can bind to synaptotagmin. fasciculation and elongation protein zeta 2 FEZ2 ENSG00000171055 NA
ENSG00000233739 NA NA RP5-1039K5.13 ENSG00000233739 NA
63971 This gene encodes a member of the kinesin family of microtubule-based motor proteins that function in the positioning of endosomes. This family member can direct mannose-6-phosphate receptor-containing vesicles from the trans-Golgi network to the plasma membrane, and it is necessary for the steady-state distribution of late endosomes/lysosomes. It is also required for the translocation of FYVE-CENT and TTC19 from the centrosome to the midbody during cytokinesis, and it plays a role in melanosome maturation. Alternative splicing of this gene results in multiple transcript variants. kinesin family member 13A KIF13A ENSG00000137177 NA
10079 NA ATPase phospholipid transporting 9A (putative) ATP9A ENSG00000054793 NA
6655 NA SOS Ras/Rho guanine nucleotide exchange factor 2 SOS2 ENSG00000100485 NA
9778 NA KIAA0232 KIAA0232 ENSG00000170871 NA
388115 NA chromosome 15 open reading frame 52 C15orf52 ENSG00000188549 NA
ENSG00000257831 NA NA RP11-596D21.1 ENSG00000257831 NA
54414 This gene encodes an enzyme which removes 9-O-acetylation modifications from sialic acids. Mutations in this gene are associated with susceptibility to autoimmune disease 6. Multiple transcript variants encoding different isoforms, found either in the cytosol or in the lysosome, have been found for this gene. sialic acid acetylesterase SIAE ENSG00000110013 NA
7009 NA transmembrane BAX inhibitor motif containing 6 TMBIM6 ENSG00000139644 NA
55755 This gene encodes a regulator of CDK5 (cyclin-dependent kinase 5) activity. The protein encoded by this gene is localized to the centrosome and Golgi complex, interacts with CDK5R1 and pericentrin (PCNT), plays a role in centriole engagement and microtubule nucleation, and has been linked to primary microcephaly and Alzheimer’s disease. Alternative splicing results in multiple transcript variants. CDK5 regulatory subunit associated protein 2 CDK5RAP2 ENSG00000136861 NA
9580 This gene encodes a member of the SOX (SRY-related HMG-box) family of transcription factors involved in the regulation of embryonic development and in the determination of cell fate. The encoded protein may act as a transcriptional regulator after forming a protein complex with other proteins. It has also been determined to be a type-1 diabetes autoantigen, also known as islet cell antibody 12. SRY-box 13 SOX13 ENSG00000143842 NA
ENSG00000254693 NA NA RP11-58K22.5 ENSG00000254693 NA
10677 The protein encoded by this gene is a member of the gelsolin/villin family of actin regulatory proteins. This protein has structural similarity to villin. It binds actin and may play a role in the development of neuronal cells that form ganglia. advillin AVIL ENSG00000135407 NA
9728 NA SECIS binding protein 2 like SECISBP2L ENSG00000138593 NA
ENSG00000272659 NA NA AP000295.10 ENSG00000272659 NA
9236 NA cell cycle progression 1 CCPG1 ENSG00000260916 NA
58472 The protein encoded by this gene may function in mitochondria to catalyze the conversion of sulfide to persulfides, thereby decreasing toxic concencrations of sulfide. Alternative splicing results in multiple transcript variants that encode the same protein. sulfide quinone reductase-like (yeast) SQRDL ENSG00000137767 NA
9826 Rho GTPases play a fundamental role in numerous cellular processes that are initiated by extracellular stimuli that work through G protein coupled receptors. The encoded protein may form a complex with G proteins and stimulate Rho-dependent signals. A similar protein in rat interacts with glutamate transporter EAAT4 and modulates its glutamate transport activity. Expression of the rat protein induces the reorganization of the actin cytoskeleton and its overexpression induces the formation of membrane ruffling and filopodia. Two alternative transcripts encoding different isoforms have been described. Rho guanine nucleotide exchange factor 11 ARHGEF11 ENSG00000132694 NA
146223 This gene belongs to the chemokine-like factor gene superfamily, a novel family that is similar to the chemokine and the transmembrane 4 superfamilies of signaling molecules. This gene is one of several chemokine-like factor genes located in a cluster on chromosome 16. Alternatively spliced transcript variants encoding different isoforms have been identified. CKLF like MARVEL transmembrane domain containing 4 CMTM4 ENSG00000183723 NA
NA NA NA NA ENSG00000272091 TRUE
9936 CD302 is a C-type lectin receptor involved in cell adhesion and migration, as well as endocytosis and phagocytosis (Kato et al., 2007 [PubMed 17947679]). CD302 molecule CD302 ENSG00000241399 NA
115548 NA FCH domain only 2 FCHO2 ENSG00000157107 NA
3092 The product of this gene is a membrane-associated protein that functions in clathrin-mediated endocytosis and protein trafficking within the cell. The encoded protein binds to the huntingtin protein in the brain; this interaction is lost in Huntington’s disease. Alternative splicing results in multiple transcript variants. huntingtin interacting protein 1 HIP1 ENSG00000127946 NA
ENSG00000231025 NA NA RP11-175O19.4 ENSG00000231025 NA
57222 This gene encodes a cycling membrane protein which is an endoplasmic reticulum-golgi intermediate compartment (ERGIC) protein which interacts with other members of this protein family to increase their turnover. endoplasmic reticulum-golgi intermediate compartment 1 ERGIC1 ENSG00000113719 NA
NA NA NA NA ENSG00000203305 TRUE
ENSG00000259468 NA NA RP11-1084A12.2 ENSG00000259468 NA
8741 The protein encoded by this gene is a member of the tumor necrosis factor (TNF) ligand family. This protein is a ligand for TNFRSF17/BCMA, a member of the TNF receptor family. This protein and its receptor are both found to be important for B cell development. In vitro experiments suggested that this protein may be able to induce apoptosis through its interaction with other TNF receptor family proteins such as TNFRSF6/FAS and TNFRSF14/HVEM. Alternative splicing results in multiple transcript variants. Some transcripts that skip the last exon of the upstream gene (TNFSF12) and continue into the second exon of this gene have been identified; such read-through transcripts are contained in GeneID 407977, TNFSF12-TNFSF13. tumor necrosis factor superfamily member 13 TNFSF13 ENSG00000161955 NA
ENSG00000234975 NA ferritin, heavy polypeptide 1 pseudogene 2 FTH1P2 ENSG00000234975 NA
ENSG00000232187 NA ferritin, heavy polypeptide 1 pseudogene 7 FTH1P7 ENSG00000232187 NA
5660 This gene encodes a highly conserved preproprotein that is proteolytically processed to generate four main cleavage products including saposins A, B, C, and D. Each domain of the precursor protein is approximately 80 amino acid residues long with nearly identical placement of cysteine residues and glycosylation sites. Saposins A-D localize primarily to the lysosomal compartment where they facilitate the catabolism of glycosphingolipids with short oligosaccharide groups. The precursor protein exists both as a secretory protein and as an integral membrane protein and has neurotrophic activities. Mutations in this gene have been associated with Gaucher disease and metachromatic leukodystrophy. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. prosaposin PSAP ENSG00000197746 NA
ENSG00000227201 NA calponin 2 pseudogene 1 CNN2P1 ENSG00000227201 NA
23052 NA endonuclease domain containing 1 ENDOD1 ENSG00000149218 NA
ENSG00000261329 NA NA CTD-2049O4.1 ENSG00000261329 NA
25778 This gene encodes a dual serine/threonine and tyrosine protein kinase which is expressed in multiple tissues. It is thought to function as a regulator of cell death. Multiple transcript variants encoding different isoforms have been found for this gene. dual serine/threonine and tyrosine protein kinase DSTYK ENSG00000133059 NA
ENSG00000261064 NA NA RP11-1000B6.3 ENSG00000261064 NA
11010 This gene encodes a protein with similarity to both the pathogenesis-related protein (PR) superfamily and the cysteine-rich secretory protein (CRISP) family. Increased expression of this gene is associated with myelomocytic differentiation in macrophage and decreased expression of this gene through gene methylation is associated with prostate cancer. The protein has proapoptotic activities in prostate and bladder cancer cells. This gene is a member of a cluster on chromosome 12 containing two other similar genes. Alternatively spliced variants which encode different protein isoforms have been described; however, not all variants have been fully characterized. GLI pathogenesis related 1 GLIPR1 ENSG00000139278 NA
2495 This gene encodes the heavy subunit of ferritin, the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in ferritin proteins are associated with several neurodegenerative diseases. This gene has multiple pseudogenes. Several alternatively spliced transcript variants have been observed, but their biological validity has not been determined. ferritin heavy chain 1 FTH1 ENSG00000167996 NA
51567 This gene encodes a member of a superfamily of divalent cation-dependent phosphodiesterases. The encoded protein associates with CD40, tumor necrosis factor (TNF) receptor-75 and TNF receptor associated factors (TRAFs), and inhibits nuclear factor-kappa-B activation. This protein has sequence and structural similarities with APE1 endonuclease, which is involved in both DNA repair and the activation of transcription factors. tyrosyl-DNA phosphodiesterase 2 TDP2 ENSG00000111802 NA
51094 This gene encodes a protein which acts as a receptor for adiponectin, a hormone secreted by adipocytes which regulates fatty acid catabolism and glucose levels. Binding of adiponectin to the encoded protein results in activation of an AMP-activated kinase signaling pathway which affects levels of fatty acid oxidation and insulin sensitivity. A pseudogene of this gene is located on chromosome 14. Multiple alternatively spliced transcript variants have been found for this gene. adiponectin receptor 1 ADIPOR1 ENSG00000159346 NA
ENSG00000248223 NA NA CTD-2139B15.2 ENSG00000248223 NA
ENSG00000267904 NA NA CTC-429P9.5 ENSG00000267904 NA
51528 NA JNK1/MAPK8-associated membrane protein JKAMP ENSG00000050130 NA
121274 NA zinc finger protein 641 ZNF641 ENSG00000167528 NA
1486 Chitobiase is a lysosomal glycosidase involved in degradation of asparagine-linked oligosaccharides on glycoproteins (Aronson and Kuranda, 1989 [PubMed 2531691]). chitobiase CTBS ENSG00000117151 NA
26224 This gene encodes a member of the F-box protein family which is characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into 3 classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbls class and, in addition to an F-box, contains several tandem leucine-rich repeats and is localized in the nucleus. F-box and leucine rich repeat protein 3 FBXL3 ENSG00000005812 NA
ENSG00000232909 NA NA RP3-510O8.4 ENSG00000232909 NA
253725 NA family with sequence similarity 21 member C FAM21C ENSG00000172661 NA
81545 This gene encodes a large protein that contains an F-box domain and may participate in protein ubiquitination. The encoded protein is a transcriptional co-activator of Krueppel-like factor 7 (Klf7). A heterozygous mutation in this gene was found in individuals with autosomal dominant distal hereditary motor neuronopathy type IID. There is a pseudogene for this gene on chromosome 4. Alternative splicing results in multiple transcript variants. F-box protein 38 FBXO38 ENSG00000145868 NA
10367 This gene encodes an essential regulator of mitochondrial Ca2+ uptake under basal conditions. The encoded protein interacts with the mitochondrial calcium uniporter, a mitochondrial inner membrane Ca2+ channel, and is essential in preventing mitochondrial Ca2+ overload, which can cause excessive production of reactive oxygen species and cell stress. Alternatively spliced transcript variants encoding different isoforms have been described. mitochondrial calcium uptake 1 MICU1 ENSG00000107745 NA
10970 NA cytoskeleton-associated protein 4 CKAP4 ENSG00000136026 NA
5681 NA protein serine kinase H1 PSKH1 ENSG00000159792 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",12,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 13 Annotations

out <- mygene::queryMany(gene_list[13,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
notfound query X_id symbol name summary
TRUE ENSG00000271738 NA NA NA NA
NA ENSG00000272512 ENSG00000272512 RP11-54O7.17 NA NA
NA ENSG00000172201 3400 ID4 inhibitor of DNA binding 4, HLH protein This gene encodes a member of the inhibitor of DNA binding (ID) protein family. These proteins are basic helix-loop-helix transcription factors which can act as tumor suppressors but lack DNA binding activity. Consequently, the activity of the encoded protein depends on the protein binding partner.
NA ENSG00000173175 111 ADCY5 adenylate cyclase 5 This gene encodes a member of the membrane-bound adenylyl cyclase enzymes. Adenylyl cyclases mediate G protein-coupled receptor signaling through the synthesis of the second messenger cAMP. Activity of the encoded protein is stimulated by the Gs alpha subunit of G protein-coupled receptors and is inhibited by protein kinase A, calcium and Gi alpha subunits. Single nucleotide polymorphisms in this gene may be associated with low birth weight and type 2 diabetes. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene.
NA ENSG00000229732 ENSG00000229732 AC019349.5 NA NA
NA ENSG00000122378 84293 FAM213A family with sequence similarity 213 member A NA
NA ENSG00000133401 23037 PDZD2 PDZ domain containing 2 The protein encoded by this gene contains six PDZ domains and shares sequence similarity with pro-interleukin-16 (pro-IL-16). Like pro-IL-16, the encoded protein localizes to the endoplasmic reticulum and is thought to be cleaved by a caspase to produce a secreted peptide containing two PDZ domains. In addition, this gene is upregulated in primary prostate tumors and may be involved in the early stages of prostate tumorigenesis.
NA ENSG00000198300 5178 PEG3 paternally expressed 3 In human, ZIM2 and PEG3 are treated as two distinct genes though they share multiple 5’ exons and a common promoter and both genes are paternally expressed (PMID:15203203). Alternative splicing events connect their shared 5’ exons either with the remaining 4 exons unique to ZIM2, or with the remaining 2 exons unique to PEG3. In contrast, in other mammals ZIM2 does not undergo imprinting and, in mouse, cow, and likely other mammals as well, the ZIM2 and PEG3 genes do not share exons. Human PEG3 protein belongs to the Kruppel C2H2-type zinc finger protein family. PEG3 may play a role in cell proliferation and p53-mediated apoptosis. PEG3 has also shown tumor suppressor activity and tumorigenesis in glioma and ovarian cells. Alternative splicing of this PEG3 gene results in multiple transcript variants encoding distinct isoforms.
NA ENSG00000157404 3815 KIT KIT proto-oncogene receptor tyrosine kinase This gene encodes the human homolog of the proto-oncogene c-kit. C-kit was first identified as the cellular homolog of the feline sarcoma viral oncogene v-kit. This protein is a type 3 transmembrane receptor for MGF (mast cell growth factor, also known as stem cell factor). Mutations in this gene are associated with gastrointestinal stromal tumors, mast cell disease, acute myelogenous lukemia, and piebaldism. Multiple transcript variants encoding different isoforms have been found for this gene.
NA ENSG00000105088 93145 OLFM2 olfactomedin 2 NA
NA ENSG00000160145 8997 KALRN kalirin, RhoGEF kinase Huntington’s disease (HD), a neurodegenerative disorder characterized by loss of striatal neurons, is caused by an expansion of a polyglutamine tract in the HD protein huntingtin. This gene encodes a protein that interacts with the huntingtin-associated protein 1, which is a huntingtin binding protein that may function in vesicle trafficking.
NA ENSG00000272678 ENSG00000272678 RP11-797D24.4 NA NA
NA ENSG00000163485 134 ADORA1 adenosine A1 receptor The protein encoded by this gene is an adenosine receptor that belongs to the G-protein coupled receptor 1 family. There are 3 types of adenosine receptors, each with a specific pattern of ligand binding and tissue distribution, and together they regulate a diverse set of physiologic functions. The type A1 receptors inhibit adenylyl cyclase, and play a role in the fertilization process. Animal studies also suggest a role for A1 receptors in kidney function and ethanol intoxication. Transcript variants with alternative splicing in the 5’ UTR have been found for this gene.
NA ENSG00000173898 6712 SPTBN2 spectrin beta, non-erythrocytic 2 Spectrins are principle components of a cell’s membrane-cytoskeleton and are composed of two alpha and two beta spectrin subunits. The protein encoded by this gene (SPTBN2), is called spectrin beta non-erythrocytic 2 or beta-III spectrin. It is related to, but distinct from, the beta-II spectrin gene which is also known as spectrin beta non-erythrocytic 1 (SPTBN1). SPTBN2 regulates the glutamate signaling pathway by stabilizing the glutamate transporter EAAT4 at the surface of the plasma membrane. Mutations in this gene cause a form of spinocerebellar ataxia, SCA5, that is characterized by neurodegeneration, progressive locomotor incoordination, dysarthria, and uncoordinated eye movements.
NA ENSG00000171766 2628 GATM glycine amidinotransferase This gene encodes a mitochondrial enzyme that belongs to the amidinotransferase family. This enzyme is involved in creatine biosynthesis, whereby it catalyzes the transfer of a guanido group from L-arginine to glycine, resulting in guanidinoacetic acid, the immediate precursor of creatine. Mutations in this gene cause arginine:glycine amidinotransferase deficiency, an inborn error of creatine synthesis characterized by mental retardation, language impairment, and behavioral disorders.
NA ENSG00000025039 58528 RRAGD Ras related GTP binding D RRAGD is a monomeric guanine nucleotide-binding protein, or G protein. By binding GTP or GDP, small G proteins act as molecular switches in numerous cell processes and signaling pathways.
NA ENSG00000182902 83733 SLC25A18 solute carrier family 25 member 18 NA
NA ENSG00000103034 65009 NDRG4 NDRG family member 4 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that is required for cell cycle progression and survival in primary astrocytes and may be involved in the regulation of mitogenic signalling in vascular smooth muscles cells. Alternative splicing results in multiple transcripts encoding different isoforms.
NA ENSG00000149809 7108 TM7SF2 transmembrane 7 superfamily member 2 NA
NA ENSG00000162373 79656 BEND5 BEN domain containing 5 NA
NA ENSG00000121690 91614 DEPDC7 DEP domain containing 7 NA
NA ENSG00000163209 6707 SPRR3 small proline rich protein 3 NA
NA ENSG00000260244 ENSG00000260244 RP11-588K22.2 NA NA
NA ENSG00000136237 9771 RAPGEF5 Rap guanine nucleotide exchange factor 5 Members of the RAS (see HRAS; MIM 190020) subfamily of GTPases function in signal transduction as GTP/GDP-regulated switches that cycle between inactive GDP- and active GTP-bound states. Guanine nucleotide exchange factors (GEFs), such as RAPGEF5, serve as RAS activators by promoting acquisition of GTP to maintain the active GTP-bound state and are the key link between cell surface receptors and RAS activation (Rebhun et al., 2000 [PubMed 10934204]).
NA ENSG00000204677 ENSG00000204677 FAM153C family with sequence similarity 153 member C NA
NA ENSG00000236609 54753 ZNF853 zinc finger protein 853 NA
NA ENSG00000169509 54544 CRCT1 cysteine rich C-terminal 1 NA
NA ENSG00000163864 349565 NMNAT3 nicotinamide nucleotide adenylyltransferase 3 This gene encodes a member of the nicotinamide/nicotinic acid mononucleotide adenylyltransferase family. These enzymes use ATP to catalyze the synthesis of nicotinamide adenine dinucleotide or nicotinic acid adenine dinucleotide from nicotinamide mononucleotide or nicotinic acid mononucleotide, respectively. The encoded protein is localized to mitochondria and may also play a neuroprotective role as a molecular chaperone. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene.
TRUE ENSG00000268358 NA NA NA NA
NA ENSG00000125378 652 BMP4 bone morphogenetic protein 4 This gene encodes a member of the bone morphogenetic protein (BMP) family of proteins, which is part of the transforming growth factor-beta (TGF-beta) superfamily. Members of the BMP family play an important role in bone and cartilage development. The encoded preproprotein is proteolytically processed to generate each subunit of the disulfide-linked homodimer. Mutations in this gene are associated with orofacial cleft and microphthalmia in human patients. The encoded protein may also be involved in the pathology of multiple cardiovascular diseases and human cancers. Alternative splicing results in multiple transcript variants.
NA ENSG00000125780 7053 TGM3 transglutaminase 3 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene consists of two polypeptide chains activated from a single precursor protein by proteolysis. The encoded protein is involved the later stages of cell envelope formation in the epidermis and hair follicle.
NA ENSG00000182985 23705 CADM1 cell adhesion molecule 1 NA
NA ENSG00000164116 2982 GUCY1A3 guanylate cyclase 1, soluble, alpha 3 Soluble guanylate cyclases are heterodimeric proteins that catalyze the conversion of GTP to 3’,5’-cyclic GMP and pyrophosphate. The protein encoded by this gene is an alpha subunit of this complex and it interacts with a beta subunit to form the guanylate cyclase enzyme, which is activated by nitric oxide. Several transcript variants encoding a few different isoforms have been found for this gene.
NA ENSG00000259933 ENSG00000259933 RP11-304L19.1 NA NA
NA ENSG00000182230 202134 FAM153B family with sequence similarity 153 member B NA
NA ENSG00000182230 100507387 LOC100507387 uncharacterized LOC100507387 NA
NA ENSG00000186998 129080 EMID1 EMI domain containing 1 NA
NA ENSG00000106772 158471 PRUNE2 prune homolog 2 The protein encoded by this gene belongs to the B-cell CLL/lymphoma 2 and adenovirus E1B 19 kDa interacting family, whose members play roles in many cellular processes including apotosis, cell transformation, and synaptic function. Several functions for this protein have been demonstrated including suppression of Ras homolog family member A activity, which results in reduced stress fiber formation and suppression of oncogenic cellular transformation. A high molecular weight isoform of this protein has also been shown to colocalize with Adaptor protein complex 2, beta-Adaptin and endodermal markers, suggesting an involvement in post-endocytic trafficking. In prostate cancer cells, this gene acts as a tumor suppressor and its expression is regulated by prostate cancer antigen 3, a non-protein coding gene on the opposite DNA strand in an intron of this gene. Prostate cancer antigen 3 regulates levels of this gene through formation of a double-stranded RNA that undergoes adenosine deaminase actin on RNA-dependent adenosine-to-inosine RNA editing. Alternative splicing results in multiple transcript variants.
NA ENSG00000124225 56937 PMEPA1 prostate transmembrane protein, androgen induced 1 This gene encodes a transmembrane protein that contains a Smad interacting motif (SIM). Expression of this gene is induced by androgens and transforming growth factor beta, and the encoded protein suppresses the androgen receptor and transforming growth factor beta signaling pathways though interactions with Smad proteins. Overexpression of this gene may play a role in multiple types of cancer. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene.
NA ENSG00000134121 10752 CHL1 cell adhesion molecule L1 like The protein encoded by this gene is a member of the L1 gene family of neural cell adhesion molecules. It is a neural recognition molecule that may be involved in signal transduction pathways. The deletion of one copy of this gene may be responsible for mental defects in patients with 3p- syndrome. This protein may also play a role in the growth of certain cancers. Alternate splicing results in both coding and non-coding variants.
NA ENSG00000135423 27165 GLS2 glutaminase 2 The protein encoded by this gene is a mitochondrial phosphate-activated glutaminase that catalyzes the hydrolysis of glutamine to stoichiometric amounts of glutamate and ammonia. Originally thought to be liver-specific, this protein has been found in other tissues as well. Alternative splicing results in multiple transcript variants that encode different isoforms.
NA ENSG00000154330 5239 PGM5 phosphoglucomutase 5 Phosphoglucomutases (EC 5.2.2.2.), such as PGM5, are phosphotransferases involved in interconversion of glucose-1-phosphate and glucose-6-phosphate. PGM activity is essential in formation of carbohydrates from glucose-6-phosphate and in formation of glucose-6-phosphate from galactose and glycogen (Edwards et al., 1995 [PubMed 8586438]).
NA ENSG00000140451 80119 PIF1 PIF1 5’-to-3’ DNA helicase This gene encodes a DNA-dependent adenosine triphosphate (ATP)-metabolizing enzyme that functions as a 5’ to 3’ DNA helicase. The encoded protein can resolve G-quadruplex structures and RNA-DNA hybrids at the ends of chromosomes. It also prevents telomere elongation by inhibiting the actions of telomerase. Alternative splicing and the use of alternative start codons results in multiple isoforms that are differentially localized to either the mitochondria or the nucleus.
NA ENSG00000142178 150094 SIK1 salt inducible kinase 1 NA
NA ENSG00000130787 9026 HIP1R huntingtin interacting protein 1 related NA
NA ENSG00000130702 3911 LAMA5 laminin subunit alpha 5 This gene encodes one of the vertebrate laminin alpha chains. Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. The protein encoded by this gene is the alpha-5 subunit of of laminin-10 (laminin-511), laminin-11 (laminin-521) and laminin-15 (laminin-523).
NA ENSG00000008710 5310 PKD1 polycystin 1, transient receptor potential channel interacting This gene encodes a member of the polycystin protein family. The encoded glycoprotein contains a large N-terminal extracellular region, multiple transmembrane domains and a cytoplasmic C-tail. It is an integral membrane protein that functions as a regulator of calcium permeable cation channels and intracellular calcium homoeostasis. It is also involved in cell-cell/matrix interactions and may modulate G-protein-coupled signal-transduction pathways. It plays a role in renal tubular development, and mutations in this gene cause autosomal dominant polycystic kidney disease type 1 (ADPKD1). ADPKD1 is characterized by the growth of fluid-filled cysts that replace normal renal tissue and result in end-stage renal failure. Splice variants encoding different isoforms have been noted for this gene. Also, six pseudogenes, closely linked in a known duplicated region on chromosome 16p, have been described.
NA ENSG00000205336 9289 ADGRG1 adhesion G protein-coupled receptor G1 This gene encodes a member of the G protein-coupled receptor family and regulates brain cortical patterning. The encoded protein binds specifically to transglutaminase 2, a component of tissue and tumor stroma implicated as an inhibitor of tumor progression. Mutations in this gene are associated with a brain malformation known as bilateral frontoparietal polymicrogyria. Alternative splicing results in multiple transcript variants.
NA ENSG00000169758 123591 TMEM266 transmembrane protein 266 NA
TRUE ENSG00000184674 NA NA NA NA
NA ENSG00000169116 25849 PARM1 prostate androgen-regulated mucin-like protein 1 NA
NA ENSG00000188732 340277 FAM221A family with sequence similarity 221 member A NA
NA ENSG00000101096 4773 NFATC2 nuclear factor of activated T-cells 2 This gene is a member of the nuclear factor of activated T cells (NFAT) family. The product of this gene is a DNA-binding protein with a REL-homology region (RHR) and an NFAT-homology region (NHR). This protein is present in the cytosol and only translocates to the nucleus upon T cell receptor (TCR) stimulation, where it becomes a member of the nuclear factors of activated T cells transcription complex. This complex plays a central role in inducing gene transcription during the immune response. Alternate transcriptional splice variants encoding different isoforms have been characterized.
NA ENSG00000231584 ENSG00000231584 FAHD2CP fumarylacetoacetate hydrolase domain containing 2C, pseudogene NA
NA ENSG00000171772 93426 SYCE1 synaptonemal complex central element protein 1 NA
NA ENSG00000171401 3860 KRT13 keratin 13 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described.
NA ENSG00000039068 999 CDH1 cadherin 1 This gene encodes a classical cadherin of the cadherin superfamily. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature glycoprotein. This calcium-dependent cell-cell adhesion protein is comprised of five extracellular cadherin repeats, a transmembrane region and a highly conserved cytoplasmic tail. Mutations in this gene are correlated with gastric, breast, colorectal, thyroid and ovarian cancer. Loss of function of this gene is thought to contribute to cancer progression by increasing proliferation, invasion, and/or metastasis. The ectodomain of this protein mediates bacterial adhesion to mammalian cells and the cytoplasmic domain is required for internalization. This gene is present in a gene cluster with other members of the cadherin family on chromosome 16.
NA ENSG00000136848 153090 DAB2IP DAB2 interacting protein DAB2IP is a Ras (MIM 190020) GTPase-activating protein (GAP) that acts as a tumor suppressor. The DAB2IP gene is inactivated by methylation in prostate and breast cancers (Yano et al., 2005 [PubMed 15386433]).
NA ENSG00000183779 80139 ZNF703 zinc finger protein 703 NA
NA ENSG00000152583 8404 SPARCL1 SPARC like 1 NA
NA ENSG00000120278 57480 PLEKHG1 pleckstrin homology and RhoGEF domain containing G1 NA
NA ENSG00000099282 23555 TSPAN15 tetraspanin 15 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. The use of alternate polyadenylation sites has been found for this gene.
NA ENSG00000198719 28514 DLL1 delta like canonical Notch ligand 1 DLL1 is a human homolog of the Notch Delta ligand and is a member of the delta/serrate/jagged family. It plays a role in mediating cell fate decisions during hematopoiesis. It may play a role in cell-to-cell communication.
NA ENSG00000101447 81610 FAM83D family with sequence similarity 83 member D NA
TRUE ENSG00000257026 NA NA NA NA
NA ENSG00000136002 50649 ARHGEF4 Rho guanine nucleotide exchange factor 4 Rho GTPases play a fundamental role in numerous cellular processes that are initiated by extracellular stimuli that work through G protein coupled receptors. The protein encoded by this gene may form complex with G proteins and stimulate Rho-dependent signals. Multiple alternatively spliced transcript variants encoding different isoforms have been found, but the full-length nature of some variants has not been determined.
NA ENSG00000135744 183 AGT angiotensinogen The protein encoded by this gene, pre-angiotensinogen or angiotensinogen precursor, is expressed in the liver and is cleaved by the enzyme renin in response to lowered blood pressure. The resulting product, angiotensin I, is then cleaved by angiotensin converting enzyme (ACE) to generate the physiologically active enzyme angiotensin II. The protein is involved in maintaining blood pressure and in the pathogenesis of essential hypertension and preeclampsia. Mutations in this gene are associated with susceptibility to essential hypertension, and can cause renal tubular dysgenesis, a severe disorder of renal tubular development. Defects in this gene have also been associated with non-familial structural atrial fibrillation, and inflammatory bowel disease.
NA ENSG00000154721 58494 JAM2 junctional adhesion molecule 2 This gene belongs to the immunoglobulin superfamily, and the junctional adhesion molecule (JAM) family. The protein encoded by this gene is a type I membrane protein that is localized at the tight junctions of both epithelial and endothelial cells. It acts as an adhesive ligand for interacting with a variety of immune cell types, and may play a role in lymphocyte homing to secondary lymphoid organs. Alternatively spliced transcript variants have been found for this gene.
NA ENSG00000167191 51704 GPRC5B G protein-coupled receptor class C group 5 member B This gene encodes a member of the type 3 G protein-coupled receptor family. Members of this superfamily are characterized by a signature 7-transmembrane domain motif. The encoded protein may modulate insulin secretion and increased protein expression is associated with type 2 diabetes. Alternative splicing results in multiple transcript variants.
NA ENSG00000206535 348801 LNP1 leukemia NUP98 fusion partner 1 NA
NA ENSG00000137269 55227 LRRC1 leucine rich repeat containing 1 NA
NA ENSG00000108852 4355 MPP2 membrane palmitoylated protein 2 Palmitoylated membrane protein 2 is a member of a family of membrane-associated proteins termed MAGUKs (membrane-associated guanylate kinase homologs). MAGUKs interact with the cytoskeleton and regulate cell proliferation, signaling pathways, and intracellular junctions. Palmitoylated membrane protein 2 contains a conserved sequence, called the SH3 (src homology 3) motif, found in several other proteins that associate with the cytoskeleton and are suspected to play important roles in signal transduction.
NA ENSG00000271218 ENSG00000271218 RP3-523E19.2 NA NA
NA ENSG00000229953 ENSG00000229953 RP11-284F21.7 NA NA
NA ENSG00000268751 643719 SCGB1B2P secretoglobin family 1B member 2, pseudogene NA
NA ENSG00000179057 283284 IGSF22 immunoglobulin superfamily member 22 NA
NA ENSG00000186260 57496 MKL2 MKL1/myocardin like 2 NA
NA ENSG00000081803 93664 CADPS2 Ca2+ dependent secretion activator 2 This gene encodes a member of the calcium-dependent activator of secretion (CAPS) protein family, which are calcium binding proteins that regulate the exocytosis of synaptic and dense-core vesicles in neurons and neuroendocrine cells. Mutations in this gene may contribute to autism susceptibility. Multiple transcript variants encoding different isoforms have been found for this gene.
NA ENSG00000272468 ENSG00000272468 RP1-86C11.7 NA NA
NA ENSG00000003147 3382 ICA1 islet cell autoantigen 1 This gene encodes a protein with an arfaptin homology domain that is found both in the cytosol and as membrane-bound form on the Golgi complex and immature secretory granules. This protein is believed to be an autoantigen in insulin-dependent diabetes mellitus and primary Sjogren’s syndrome. Several transcript variants encoding two different isoforms have been found for this gene.
NA ENSG00000111879 79632 FAM184A family with sequence similarity 184 member A NA
NA ENSG00000156968 255027 MPV17L MPV17 mitochondrial inner membrane protein like NA
NA ENSG00000161835 160622 GRASP GRP1 (general receptor for phosphoinositides 1)-associated scaffold protein This gene encodes a protein that functions as a molecular scaffold, linking receptors, including group 1 metabotropic glutamate receptors, to neuronal proteins. The encoded protein contains conserved domains, including a leucine zipper sequence, PDZ domain and a C-terminal PDZ-binding motif. Alternately spliced transcript variants have been observed for this gene.
NA ENSG00000156113 3778 KCNMA1 potassium calcium-activated channel subfamily M alpha 1 MaxiK channels are large conductance, voltage and calcium-sensitive potassium channels which are fundamental to the control of smooth muscle tone and neuronal excitability. MaxiK channels can be formed by 2 subunits: the pore-forming alpha subunit, which is the product of this gene, and the modulatory beta subunit. Intracellular calcium regulates the physical association between the alpha and beta subunits. Alternatively spliced transcript variants encoding different isoforms have been identified.
NA ENSG00000162438 11330 CTRC chymotrypsin C This gene encodes a member of the peptidase S1 family. The encoded protein is a serum calcium-decreasing factor that has chymotrypsin-like protease activity. Alternatively spliced transcript variants have been observed, but their full-length nature has not been determined.
NA ENSG00000063180 770 CA11 carbonic anhydrase 11 Carbonic anhydrases (CAs) are a large family of zinc metalloenzymes that catalyze the reversible hydration of carbon dioxide. They participate in a variety of biological processes, including respiration, calcification, acid-base balance, bone resorption, and the formation of aqueous humor, cerebrospinal fluid, saliva, and gastric acid. They show extensive diversity in tissue distribution and in their subcellular localization. CA XI is likely a secreted protein, however, radical changes at active site residues completely conserved in CA isozymes with catalytic activity, make it unlikely that it has carbonic anhydrase activity. It shares properties in common with two other acatalytic CA isoforms, CA VIII and CA X. CA XI is most abundantly expressed in brain, and may play a general role in the central nervous system.
NA ENSG00000172264 140733 MACROD2 MACRO domain containing 2 NA
NA ENSG00000186352 353322 ANKRD37 ankyrin repeat domain 37 NA
NA ENSG00000136160 1910 EDNRB endothelin receptor type B The protein encoded by this gene is a G protein-coupled receptor which activates a phosphatidylinositol-calcium second messenger system. Its ligand, endothelin, consists of a family of three potent vasoactive peptides: ET1, ET2, and ET3. Studies suggest that the multigenic disorder, Hirschsprung disease type 2, is due to mutations in the endothelin receptor type B gene. Alternatively spliced transcript variants encoding different isoforms have been found for this gene.
NA ENSG00000129595 64097 EPB41L4A erythrocyte membrane protein band 4.1 like 4A Members of the band 4.1 protein superfamily, including EPB41L4A, are thought to regulate the interaction between the cytoskeleton and plasma membrane (Ishiguro et al., 2000 [PubMed 10874211]).
NA ENSG00000215845 100131187 TSTD1 thiosulfate sulfurtransferase like domain containing 1 NA
NA ENSG00000106123 2051 EPHB6 EPH receptor B6 This gene encodes a member of a family of transmembrane proteins that function as receptors for ephrin-B family proteins. Unlike other members of this family, the encoded protein does not contain a functional kinase domain. Activity of this protein can influence cell adhesion and migration. Expression of this gene is downregulated during tumor progression, suggesting that the protein may suppress tumor invasion and metastasis. Alternative splicing results in multiple transcript variants.
NA ENSG00000188763 8326 FZD9 frizzled class receptor 9 Members of the ‘frizzled’ gene family encode 7-transmembrane domain proteins that are receptors for Wnt signaling proteins. The FZD9 gene is located within the Williams syndrome common deletion region of chromosome 7, and heterozygous deletion of the FZD9 gene may contribute to the Williams syndrome phenotype. FZD9 is expressed predominantly in brain, testis, eye, skeletal muscle, and kidney.
NA ENSG00000143536 49860 CRNN cornulin This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation.
NA ENSG00000072201 84708 LNX1 ligand of numb-protein X 1 This gene encodes a membrane-bound protein that is involved in signal transduction and protein interactions. The encoded product is an E3 ubiquitin-protein ligase, which mediates ubiquitination and subsequent proteasomal degradation of proteins containing phosphotyrosine binding (PTB) domains. This protein may play an important role in tumorogenesis. Alternatively spliced transcript variants encoding distinct isoforms have been described. A pseudogene, which is located on chromosome 17, has been identified for this gene.
NA ENSG00000171159 79095 C9orf16 chromosome 9 open reading frame 16 NA
NA ENSG00000183049 57118 CAMK1D calcium/calmodulin dependent protein kinase ID This gene is a member of the calcium/calmodulin-dependent protein kinase 1 family, a subfamily of the serine/threonine kinases. The encoded protein is a component of the calcium-regulated calmodulin-dependent protein kinase cascade. It has been associated with multiple processes including regulation of granulocyte function, activation of CREB-dependent gene transcription, aldosterone synthesis, differentiation and activation of neutrophil cells, and apoptosis of erythroleukemia cells. Alternatively spliced transcript variants encoding different isoforms of this gene have been described.
NA ENSG00000076554 7163 TPD52 tumor protein D52 NA
NA ENSG00000188779 390598 SKOR1 SKI family transcriptional corepressor 1 NA
NA ENSG00000162772 467 ATF3 activating transcription factor 3 This gene encodes a member of the mammalian activation transcription factor/cAMP responsive element-binding (CREB) protein family of transcription factors. This gene is induced by a variety of signals, including many of those encountered by cancer cells, and is involved in the complex process of cellular stress response. Multiple transcript variants encoding different isoforms have been found for this gene. It is possible that alternative splicing of this gene may be physiologically important in the regulation of target genes.
NA ENSG00000261113 ENSG00000261113 RP11-141O15.1 NA NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",13,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 14 Annotations

out <- mygene::queryMany(gene_list[14,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id symbol summary query name notfound
5967 REG1A This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. ENSG00000115386 regenerating family member 1 alpha NA
NA NA NA ENSG00000165862 NA TRUE
22943 DKK1 This gene encodes a protein that is a member of the dickkopf family. It is a secreted protein with two cysteine rich regions and is involved in embryonic development through its inhibition of the WNT signaling pathway. Elevated levels of DKK1 in bone marrow plasma and peripheral blood is associated with the presence of osteolytic bone lesions in patients with multiple myeloma. ENSG00000107984 dickkopf WNT signaling pathway inhibitor 1 NA
123036 TC2N NA ENSG00000165929 tandem C2 domains, nuclear NA
132299 OCIAD2 NA ENSG00000145247 OCIA domain containing 2 NA
9874 TLK1 The protein encoded by this gene is a serine/threonine kinase that may be involved in the regulation of chromatin assembly. The encoded protein is only active when it is phosphorylated, and this phosphorylation is cell cycle-dependent, with the maximal activity of this protein coming during S phase. The catalytic activity of this protein is diminished by DNA damage and by blockage of DNA replication. Three transcript variants encoding different isoforms have been found for this gene. ENSG00000198586 tousled like kinase 1 NA
5721 PSME2 The 26S proteasome is a multicatalytic proteinase complex with a highly ordered structure composed of 2 complexes, a 20S core and a 19S regulator. The 20S core is composed of 4 rings of 28 non-identical subunits; 2 rings are composed of 7 alpha subunits and 2 rings are composed of 7 beta subunits. The 19S regulator is composed of a base, which contains 6 ATPase subunits and 2 non-ATPase subunits, and a lid, which contains up to 10 non-ATPase subunits. Proteasomes are distributed throughout eukaryotic cells at a high concentration and cleave peptides in an ATP/ubiquitin-dependent process in a non-lysosomal pathway. An essential function of a modified proteasome, the immunoproteasome, is the processing of class I MHC peptides. The immunoproteasome contains an alternate regulator, referred to as the 11S regulator or PA28, that replaces the 19S regulator. Three subunits (alpha, beta and gamma) of the 11S regulator have been identified. This gene encodes the beta subunit of the 11S regulator, one of the two 11S subunits that is induced by gamma-interferon. Three beta and three alpha subunits combine to form a heterohexameric ring. Six pseudogenes have been identified on chromosomes 4, 5, 8, 10 and 13. ENSG00000100911 proteasome activator subunit 2 NA
5644 PRSS1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. ENSG00000204983 protease, serine 1 NA
939 CD27 The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptor is required for generation and long-term maintenance of T cell immunity. It binds to ligand CD70, and plays a key role in regulating B-cell activation and immunoglobulin synthesis. This receptor transduces signals that lead to the activation of NF-kappaB and MAPK8/JNK. Adaptor proteins TRAF2 and TRAF5 have been shown to mediate the signaling process of this receptor. CD27-binding protein (SIVA), a proapoptotic protein, can bind to this receptor and is thought to play an important role in the apoptosis induced by this receptor. ENSG00000139193 CD27 molecule NA
122618 PLD4 NA ENSG00000166428 phospholipase D family member 4 NA
23403 FBXO46 Members of the F-box protein family, such as FBXO46, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). ENSG00000177051 F-box protein 46 NA
ENSG00000233849 AC022201.5 NA ENSG00000233849 NA NA
6119 RPA3 NA ENSG00000106399 replication protein A3 NA
64393 ZMAT3 This gene encodes a protein containing three zinc finger domains and a nuclear localization signal. The mRNA and the protein of this gene are upregulated by wildtype p53 and overexpression of this gene inhibits tumor cell growth, suggesting that this gene may have a role in the p53-dependent growth regulatory pathway. Alternative splicing of this gene results in two transcript variants encoding two isoforms differing in only one amino acid. ENSG00000172667 zinc finger matrin-type 3 NA
163786 SASS6 SAS6 is necessary for centrosome duplication and functions during procentriole formation; SAS6 functions to ensure that each centriole seeds the formation of a single procentriole per cell cycle Strnad et al., (2007) [PubMed 17681132]. ENSG00000156876 SAS-6 centriolar assembly protein NA
117584 RFFL NA ENSG00000092871 ring finger and FYVE-like domain containing E3 ubiquitin protein ligase NA
34 ACADM This gene encodes the medium-chain specific (C4 to C12 straight chain) acyl-Coenzyme A dehydrogenase. The homotetramer enzyme catalyzes the initial step of the mitochondrial fatty acid beta-oxidation pathway. Defects in this gene cause medium-chain acyl-CoA dehydrogenase deficiency, a disease characterized by hepatic dysfunction, fasting hypoglycemia, and encephalopathy, which can result in infantile death. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000117054 acyl-CoA dehydrogenase, C-4 to C-12 straight chain NA
ENSG00000237950 RP11-7O11.3 NA ENSG00000237950 NA NA
ENSG00000183444 OR7E38P NA ENSG00000183444 olfactory receptor family 7 subfamily E member 38 pseudogene NA
11179 ZNF277 NA ENSG00000198839 zinc finger protein 277 NA
6772 STAT1 The protein encoded by this gene is a member of the STAT protein family. In response to cytokines and growth factors, STAT family members are phosphorylated by the receptor associated kinases, and then form homo- or heterodimers that translocate to the cell nucleus where they act as transcription activators. This protein can be activated by various ligands including interferon-alpha, interferon-gamma, EGF, PDGF and IL6. This protein mediates the expression of a variety of genes, which is thought to be important for cell viability in response to different cell stimuli and pathogens. Two alternatively spliced transcript variants encoding distinct isoforms have been described. ENSG00000115415 signal transducer and activator of transcription 1 NA
ENSG00000230177 RP5-1112D6.4 NA ENSG00000230177 NA NA
1611 DAP This gene encodes a basic, proline-rich, 15-kD protein. The protein acts as a positive mediator of programmed cell death that is induced by interferon-gamma. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. ENSG00000112977 death-associated protein NA
55349 CHDH The protein encoded by this gene is a choline dehydrogenase that localizes to the mitochondrion. Variations in this gene can affect susceptibility to choline deficiency. A few transcript variants have been found for this gene, but the full-length nature of only one has been characterized to date. ENSG00000016391 choline dehydrogenase NA
3978 LIG1 This gene encodes a member of the ATP-dependent DNA ligase protein family. The encoded protein functions in DNA replication, recombination, and the base excision repair process. Mutations in this gene that lead to DNA ligase I deficiency result in immunodeficiency and increased sensitivity to DNA-damaging agents. Disruption of this gene may also be associated with a variety of cancers. Alternative splicing results in multiple transcript variants. ENSG00000105486 DNA ligase 1 NA
11019 LIAS The protein encoded by this gene belongs to the biotin and lipoic acid synthetases family. It localizes in mitochondrion and plays an important role in alpha-(+)-lipoic acid synthesis. It may also function in the sulfur insertion chemistry in lipoate biosynthesis. Alternative splicing occurs at this locus and two transcript variants encoding distinct isoforms have been identified. ENSG00000121897 lipoic acid synthetase NA
221294 NT5DC1 While the exact function of the protein encoded by this gene is not known, it belongs to the 5’(3’)-deoxyribonucleotidase family. ENSG00000178425 5’-nucleotidase domain containing 1 NA
ENSG00000203644 RP11-332M2.1 NA ENSG00000203644 NA NA
55166 CENPQ CENPQ is a subunit of a CENPH (MIM 605607)-CENPI (MIM 300065)-associated centromeric complex that targets CENPA (MIM 117139) to centromeres and is required for proper kinetochore function and mitotic progression (Okada et al., 2006 [PubMed 16622420]). ENSG00000031691 centromere protein Q NA
84333 PCGF5 NA ENSG00000180628 polycomb group ring finger 5 NA
2222 FDFT1 This gene encodes a membrane-associated enzyme located at a branch point in the mevalonate pathway. The encoded protein is the first specific enzyme in cholesterol biosynthesis, catalyzing the dimerization of two molecules of farnesyl diphosphate in a two-step reaction to form squalene. ENSG00000079459 farnesyl-diphosphate farnesyltransferase 1 NA
29916 SNX11 This gene encodes a member of the sorting nexin family. Members of this family contain a phox (PX) domain, which is a phosphoinositide binding domain, and are involved in intracellular trafficking. This protein does not contain a coiled coil region, like some family members. This gene encodes a protein of unknown function. This gene results in two transcript variants differing in the 5’ UTR, but encoding the same protein. ENSG00000002919 sorting nexin 11 NA
ENSG00000236326 RP3-486I3.5 NA ENSG00000236326 NA NA
85441 HELZ2 The protein encoded by this gene is a nuclear transcriptional co-activator for peroxisome proliferator activated receptor alpha. The encoded protein contains a zinc finger and is a helicase that appears to be part of the peroxisome proliferator activated receptor alpha interacting complex. This gene is a member of the DNA2/NAM7 helicase gene family. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000130589 helicase with zinc finger 2 NA
121441 NEDD1 NA ENSG00000139350 neural precursor cell expressed, developmentally down-regulated 1 NA
51478 HSD17B7 HSD17B7 encodes an enzyme that functions both as a 17-beta-hydroxysteroid dehydrogenase (EC 1.1.1.62) in the biosynthesis of sex steroids and as a 3-ketosteroid reductase (EC 1.1.1.270) in the biosynthesis of cholesterol (Marijanovic et al., 2003 [PubMed 12829805]). ENSG00000132196 hydroxysteroid 17-beta dehydrogenase 7 NA
5229 PGGT1B Protein geranylgeranyltransferase type I (GGTase-I) transfers a geranylgeranyl group to the cysteine residue of candidate proteins containing a C-terminal CAAX motif in which ‘A’ is an aliphatic amino acid and ‘X’ is leucine (summarized by Zhang et al., 1994 [PubMed 8106351]). The enzyme is composed of a 48-kD alpha subunit (FNTA; MIM 134635) and a 43-kD beta subunit, encoded by the PGGT1B gene. The FNTA gene encodes the alpha subunit for both GGTase-I and the related enzyme farnesyltransferase. ENSG00000164219 protein geranylgeranyltransferase type I subunit beta NA
ENSG00000213621 RPSAP54 NA ENSG00000213621 ribosomal protein SA pseudogene 54 NA
2272 FHIT This gene, a member of the histidine triad gene family, encodes a diadenosine 5’,5’’’-P1,P3-triphosphate hydrolase involved in purine metabolism. The gene encompasses the common fragile site FRA3B on chromosome 3, where carcinogen-induced damage can lead to translocations and aberrant transcripts of this gene. In fact, aberrant transcripts from this gene have been found in about half of all esophageal, stomach, and colon carcinomas. Alternatively spliced transcript variants have been found for this gene. ENSG00000189283 fragile histidine triad NA
23306 NEMP1 NA ENSG00000166881 nuclear envelope integral membrane protein 1 NA
4528 MTIF2 During the initiation of protein biosynthesis, initiation factor-2 (IF-2) promotes the binding of the initiator tRNA to the small subunit of the ribosome in a GTP-dependent manner. Prokaryotic IF-2 is a single polypeptide, while eukaryotic cytoplasmic IF-2 (eIF-2) is a trimeric protein. Bovine liver mitochondria contain IF-2(mt), an 85-kD monomeric protein that is equivalent to prokaryotic IF-2. The predicted 727-amino acid human protein contains a 29-amino acid presequence. Human IF-2(mt) shares 32 to 38% amino acid sequence identity with yeast IF-2(mt) and several prokaryotic IF-2s, with the greatest degree of conservation in the G domains of the proteins. ENSG00000085760 mitochondrial translational initiation factor 2 NA
ENSG00000229931 RP1-151F17.1 NA ENSG00000229931 NA NA
79603 CERS4 NA ENSG00000090661 ceramide synthase 4 NA
5096 PCCB The protein encoded by this gene is a subunit of the propionyl-CoA carboxylase (PCC) enzyme, which is involved in the catabolism of propionyl-CoA. PCC is a mitochondrial enzyme that probably acts as a dodecamer of six alpha subunits and six beta subunits. This gene encodes the beta subunit of PCC. Defects in this gene are a cause of propionic acidemia type II (PA-2). Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000114054 propionyl-CoA carboxylase beta subunit NA
51527 GSKIP This gene encodes a protein that is involved as a negative regulator of GSK3-beta in the Wnt signaling pathway. The encoded protein may play a role in the retinoic acid signaling pathway by regulating the functional interactions between GSK3-beta, beta-catenin and cyclin D1, and it regulates the beta-catenin/N-cadherin pool. The encoded protein contains a GSK3-beta interacting domain (GID) in its C-terminus, which is similar to the GID of Axin. The protein also contains an evolutionarily conserved RII-binding domain, which facilitates binding with protein kinase-A and GSK3-beta, enabling its role as an A-kinase anchoring protein. Alternatively spliced transcript variants have been observed for this gene. ENSG00000100744 GSK3B interacting protein NA
79980 DSN1 This gene encodes a kinetochore protein that functions as part of the minichromosome instability-12 centromere complex. The encoded protein is required for proper kinetochore assembly and progression through the cell cycle. Alternative splicing results in multiple transcript variants. ENSG00000149636 DSN1 homolog, MIS12 kinetochore complex component NA
5636 PRPSAP2 This gene encodes a protein that associates with the enzyme phosphoribosylpyrophosphate synthetase (PRS). PRS catalyzes the formation of phosphoribosylpyrophosphate which is a substrate for synthesis of purine and pyrimidine nucleotides, histidine, tryptophan and NAD. PRS exists as a complex with two catalytic subunits and two associated subunits. This gene encodes a non-catalytic associated subunit of PRS. Alternate splicing results in multiple transcript variants. ENSG00000141127 phosphoribosyl pyrophosphate synthetase associated protein 2 NA
51251 NT5C3A This gene encodes a member of the 5’-nucleotidase family of enzymes that catalyze the dephosphorylation of nucleoside 5’-monophosphates. The encoded protein is the type 1 isozyme of pyrimidine 5’ nucleotidase and catalyzes the dephosphorylation of pyrimidine 5’ monophosphates. Mutations in this gene are a cause of hemolytic anemia due to uridine 5-prime monophosphate hydrolase deficiency. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene, and pseudogenes of this gene are located on the long arm of chromosomes 3 and 4. ENSG00000122643 5’-nucleotidase, cytosolic IIIA NA
NA NA NA ENSG00000233137 NA TRUE
51106 TFB1M The protein encoded by this gene is a dimethyltransferase that methylates the conserved stem loop of mitochondrial 12S rRNA. The encoded protein also is part of the basal mitochondrial transcription complex and is necessary for mitochondrial gene expression. The methylation and transcriptional activities of this protein are independent of one another. Variations in this gene may influence the severity of aminoglycoside-induced deafness (AID). ENSG00000029639 transcription factor B1, mitochondrial NA
26148 C10orf12 NA ENSG00000155640 chromosome 10 open reading frame 12 NA
ENSG00000216895 AC009403.2 NA ENSG00000216895 NA NA
283643 C14orf80 NA ENSG00000185347 chromosome 14 open reading frame 80 NA
57001 SDHAF3 NA ENSG00000196636 succinate dehydrogenase complex assembly factor 3 NA
ENSG00000261684 RP11-265N6.1 NA ENSG00000261684 NA NA
339745 SPOPL NA ENSG00000144228 speckle type BTB/POZ protein like NA
8717 TRADD The protein encoded by this gene is a death domain containing adaptor molecule that interacts with TNFRSF1A/TNFR1 and mediates programmed cell death signaling and NF-kappaB activation. This protein binds adaptor protein TRAF2, reduces the recruitment of inhibitor-of-apoptosis proteins (IAPs) by TRAF2, and thus suppresses TRAF2 mediated apoptosis. This protein can also interact with receptor TNFRSF6/FAS and adaptor protein FADD/MORT1, and is involved in the Fas-induced cell death pathway. ENSG00000102871 TNFRSF1A associated via death domain NA
NA NA NA ENSG00000129282 NA TRUE
150962 PUS10 Pseudouridination, the isomerization of uridine to pseudouridine, is the most common posttranscriptional nucleotide modification found in RNA and is essential for biologic functions such as spliceosome biogenesis. Pseudouridylate synthases, such as PUS10, catalyze pseudouridination of structural RNAs, including transfer, ribosomal, and splicing RNAs. These enzymes also act as RNA chaperones, facilitating the correct folding and assembly of tRNAs (McCleverty et al., 2007 [PubMed 17900615]). ENSG00000162927 pseudouridylate synthase 10 NA
10362 HMG20B NA ENSG00000064961 high mobility group 20B NA
195828 ZNF367 NA ENSG00000165244 zinc finger protein 367 NA
100131187 TSTD1 NA ENSG00000215845 thiosulfate sulfurtransferase like domain containing 1 NA
168374 ZNF92 NA ENSG00000146757 zinc finger protein 92 NA
55157 DARS2 The protein encoded by this gene belongs to the class-II aminoacyl-tRNA synthetase family. It is a mitochondrial enzyme that specifically aminoacylates aspartyl-tRNA. Mutations in this gene are associated with leukoencephalopathy with brainstem and spinal cord involvement and lactate elevation (LBSL). ENSG00000117593 aspartyl-tRNA synthetase 2, mitochondrial NA
ENSG00000223551 TMSB4XP4 NA ENSG00000223551 thymosin beta 4, X-linked pseudogene 4 NA
4507 MTAP This gene encodes an enzyme that plays a major role in polyamine metabolism and is important for the salvage of both adenine and methionine. The encoded enzyme is deficient in many cancers because this gene and the tumor suppressor p16 gene are co-deleted. Multiple alternatively spliced transcript variants have been described for this gene, but their full-length natures remain unknown. ENSG00000099810 methylthioadenosine phosphorylase NA
ENSG00000212789 ST13P5 NA ENSG00000212789 suppression of tumorigenicity 13 (colon carcinoma) (Hsp70 interacting protein) pseudogene 5 NA
55732 C1orf112 NA ENSG00000000460 chromosome 1 open reading frame 112 NA
ENSG00000182165 TP53TG1 NA ENSG00000182165 TP53 target 1 (non-protein coding) NA
3455 IFNAR2 The protein encoded by this gene is a type I membrane protein that forms one of the two chains of a receptor for interferons alpha and beta. Binding and activation of the receptor stimulates Janus protein kinases, which in turn phosphorylate several proteins, including STAT1 and STAT2. Multiple transcript variants encoding at least two different isoforms have been found for this gene. ENSG00000159110 interferon alpha and beta receptor subunit 2 NA
644591 PPIAL4G NA ENSG00000236334 peptidylprolyl isomerase A like 4G NA
196743 PAOX NA ENSG00000148832 polyamine oxidase (exo-N4-amino) NA
ENSG00000218175 AC016739.2 NA ENSG00000218175 NA NA
89891 WDR34 This gene encodes a member of the WD repeat protein family. WD repeats are minimally conserved regions of approximately 40 amino acids typically bracketed by gly-his and trp-asp (GH-WD), which may facilitate formation of heterotrimeric or multiprotein complexes. Members of this family are involved in a variety of cellular processes, including cell cycle progression, signal transduction, apoptosis, and gene regulation. Defects in this gene are a cause of short-rib thoracic dysplasia 11 with or without polydactyly. ENSG00000119333 WD repeat domain 34 NA
26275 HIBCH This gene encodes the enzyme responsible for hydrolysis of both HIBYL-CoA and beta-hydroxypropionyl-CoA. Mutations in this gene have been associated with 3-hyroxyisobutyryl-CoA hydrolase deficiency. Alternative splicing results in multiple transcript variants. ENSG00000198130 3-hydroxyisobutyryl-CoA hydrolase NA
10519 CIB1 This gene encodes a member of the EF-hand domain-containing calcium-binding superfamily. The encoded protein interacts with many other proteins, including the platelet integrin alpha-IIb-beta-3, DNA-dependent protein kinase, presenilin-2, focal adhesion kinase, p21 activated kinase, and protein kinase D. The encoded protein may be involved in cell survival and proliferation, and is associated with several disease states including cancer and Alzheimer’s disease. Alternative splicing results in multiple transcript variants. ENSG00000185043 calcium and integrin binding 1 NA
3665 IRF7 IRF7 encodes interferon regulatory factor 7, a member of the interferon regulatory transcription factor (IRF) family. IRF7 has been shown to play a role in the transcriptional activation of virus-inducible cellular genes, including interferon beta chain genes. Inducible expression of IRF7 is largely restricted to lymphoid tissue. Multiple IRF7 transcript variants have been identified, although the functional consequences of these have not yet been established. ENSG00000185507 interferon regulatory factor 7 NA
25771 TBC1D22A NA ENSG00000054611 TBC1 domain family member 22A NA
339559 ZFP69 NA ENSG00000187815 ZFP69 zinc finger protein NA
4361 MRE11A This gene encodes a nuclear protein involved in homologous recombination, telomere length maintenance, and DNA double-strand break repair. By itself, the protein has 3’ to 5’ exonuclease activity and endonuclease activity. The protein forms a complex with the RAD50 homolog; this complex is required for nonhomologous joining of DNA ends and possesses increased single-stranded DNA endonuclease and 3’ to 5’ exonuclease activities. In conjunction with a DNA ligase, this protein promotes the joining of noncomplementary ends in vitro using short homologies near the ends of the DNA fragments. This gene has a pseudogene on chromosome 3. Alternative splicing of this gene results in two transcript variants encoding different isoforms. ENSG00000020922 MRE11 homolog A, double strand break repair nuclease NA
55734 ZFP64 NA ENSG00000020256 ZFP64 zinc finger protein NA
63979 FIGNL1 NA ENSG00000132436 fidgetin like 1 NA
900 CCNG1 The eukaryotic cell cycle is governed by cyclin-dependent protein kinases (CDKs) whose activities are regulated by cyclins and CDK inhibitors. The protein encoded by this gene is a member of the cyclin family and contains the cyclin box. The encoded protein lacks the protein destabilizing (PEST) sequence that is present in other family members. Transcriptional activation of this gene can be induced by tumor protein p53. Two transcript variants encoding the same protein have been identified for this gene. ENSG00000113328 cyclin G1 NA
55300 PI4K2B Phosphatidylinositol 4-kinases (PI4Ks) phosphorylate phosphatidylinositol to generate phosphatidylinositol 4-phosphate (PIP), an immediate precursor of several important signaling and scaffolding molecules. PIP itself may also have direct functional and structural roles. PI4K2B is a primarily cytosolic PI4K that is recruited to membranes, where it stimulates phosphatidylinositol 4,5-bisphosphate synthesis (Wei et al., 2002 [PubMed 12324459]). ENSG00000038210 phosphatidylinositol 4-kinase type 2 beta NA
25901 CCDC28A This gene is located in a region close to the locus of the pseudogene of chemokine (C-C motif) receptor-like 1 on chromosome 6. The specific function of this gene has not yet been determined. ENSG00000024862 coiled-coil domain containing 28A NA
201725 C4orf46 This gene encodes a small, conserved protein of unknown function that is expressed in a variety of tissues. There are pseudogenes for this gene on chromosomes 6, 8, 16, and X. Alternative splicing results in multiple transcript variants. ENSG00000205208 chromosome 4 open reading frame 46 NA
55170 PRMT6 The protein encoded by this gene belongs to the arginine N-methyltransferase family, which catalyze the sequential transfer of methyl group from S-adenosyl-L-methionine to the side chain nitrogens of arginine residues within proteins, to form methylated arginine derivatives and S-adenosyl-L-homocysteine. This protein can catalyze both, the formation of omega-N monomethylarginine and asymmetrical dimethylarginine, with a strong preference for the latter. It specifically mediates the asymmetric dimethylation of Arg2 of histone H3, and the methylated form represents a specific tag for epigenetic transcriptional repression. This protein also forms a complex with, and methylates DNA polymerase beta, resulting in stimulation of polymerase activity by enhancing DNA binding and processivity. ENSG00000198890 protein arginine methyltransferase 6 NA
57407 NMRAL1 This gene encodes an NADPH sensor protein that preferentially binds to NADPH. The encoded protein also negatively regulates the activity of NF-kappaB in a ubiquitylation-dependent manner. It plays a key role in cellular antiviral response by negatively regulating the interferon response factor 3-mediated expression of interferon beta. Alternative splicing of this gene results in multiple transcript variants. ENSG00000153406 NmrA-like family domain containing 1 NA
840 CASP7 This gene encodes a member of the cysteine-aspartic acid protease (caspase) family. Sequential activation of caspases plays a central role in the execution-phase of cell apoptosis. Caspases exist as inactive proenzymes which undergo proteolytic processing at conserved aspartic residues to produce two subunits, large and small, that dimerize to form the active enzyme. The precursor of the encoded protein is cleaved by caspase 3 and 10, is activated upon cell death stimuli and induces apoptosis. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000165806 caspase 7 NA
55791 LRIF1 NA ENSG00000121931 ligand dependent nuclear receptor interacting factor 1 NA
ENSG00000269749 AC005614.5 NA ENSG00000269749 NA NA
84191 FAM96A NA ENSG00000166797 family with sequence similarity 96 member A NA
3157 HMGCS1 NA ENSG00000112972 3-hydroxy-3-methylglutaryl-CoA synthase 1 NA
ENSG00000269534 CTC-453G23.5 NA ENSG00000269534 NA NA
79828 METTL8 NA ENSG00000123600 methyltransferase like 8 NA
ENSG00000261438 RP11-399O19.9 NA ENSG00000261438 NA NA
79866 BORA BORA is an activator of the protein kinase Aurora A (AURKA; MIM 603072), which is required for centrosome maturation, spindle assembly, and asymmetric protein localization during mitosis (Hutterer et al., 2006 [PubMed 16890155]). ENSG00000136122 bora, aurora kinase A activator NA
129531 MITD1 Abscission, the separation of daughter cells at the end of cytokinesis, is effected by endosomal sorting complexes required for transport III (ESCRT-III). The protein encoded by this gene functions as a homodimer, with the N-termini binding to a subset of ESCRT-III subunits and the C-termini binding to membranes. The encoded protein regulates ESCRT-III activity and is required for proper cytokinesis. Several transcript variants encoding different isoforms have been found for this gene. ENSG00000158411 microtubule interacting and trafficking domain containing 1 NA
100506100 LOC100506100 NA ENSG00000223478 uncharacterized LOC100506100 NA
11235 PDCD10 This gene encodes an evolutionarily conserved protein associated with cell apoptosis. The protein interacts with the serine/threonine protein kinase MST4 to modulate the extracellular signal-regulated kinase (ERK) pathway. It also interacts with and is phosphoryated by serine/threonine kinase 25, and is thought to function in a signaling pathway essential for vascular developent. Mutations in this gene are one cause of cerebral cavernous malformations, which are vascular malformations that cause seizures and cerebral hemorrhages. Multiple alternatively spliced variants, encoding the same protein, have been identified. ENSG00000114209 programmed cell death 10 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",14,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 15 Annotations

out <- mygene::queryMany(gene_list[15,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query symbol X_id summary name notfound
ENSG00000197249 SERPINA1 5265 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. serpin family A member 1 NA
ENSG00000254814 RP11-535A19.1 ENSG00000254814 NA NA NA
ENSG00000225075 RP11-426L16.3 ENSG00000225075 NA NA NA
ENSG00000161692 DBF4B 80174 This gene encodes a regulator of the cell division cycle 7 homolog (S. cerevisiae) protein, a serine-threonine kinase which links cell cycle regulation to genome duplication. This protein localizes to the nucleus and, in complex with the cell division cycle 7 homolog (S. cerevisiae) protein, may facilitate M phase progression. Alternative splicing results in multiple transcript variants. DBF4 zinc finger B NA
ENSG00000240494 RPS12P28 ENSG00000240494 NA ribosomal protein S12 pseudogene 28 NA
ENSG00000272146 ARF4-AS1 106144532 NA ARF4 antisense RNA 1 NA
ENSG00000184635 ZNF93 81931 NA zinc finger protein 93 NA
ENSG00000236618 PITPNA-AS1 100306951 NA PITPNA antisense RNA 1 NA
ENSG00000141570 CBX8 57332 NA chromobox 8 NA
ENSG00000138166 DUSP5 1847 The protein encoded by this gene is a member of the dual specificity protein phosphatase subfamily. These phosphatases inactivate their target kinases by dephosphorylating both the phosphoserine/threonine and phosphotyrosine residues. They negatively regulate members of the mitogen-activated protein (MAP) kinase superfamily (MAPK/ERK, SAPK/JNK, p38), which are associated with cellular proliferation and differentiation. Different members of the family of dual specificity phosphatases show distinct substrate specificities for various MAP kinases, different tissue distribution and subcellular localization, and different modes of inducibility of their expression by extracellular stimuli. This gene product inactivates ERK1, is expressed in a variety of tissues with the highest levels in pancreas and brain, and is localized in the nucleus. dual specificity phosphatase 5 NA
ENSG00000167920 TMEM99 147184 NA transmembrane protein 99 NA
ENSG00000103995 CEP152 22995 This gene encodes a protein that is thought to be involved with centrosome function. Mutations in this gene have been associated with primary microcephaly (MCPH4). Alternative splicing results in multiple transcript variants. centrosomal protein 152 NA
ENSG00000213988 ZNF90 7643 NA zinc finger protein 90 NA
ENSG00000248932 LOC100507291 100507291 NA uncharacterized LOC100507291 NA
ENSG00000253540 FAM86HP ENSG00000253540 NA family with sequence similarity 86 member H, pseudogene NA
ENSG00000118162 KPTN 11133 This gene encodes a filamentous-actin-associated protein, which is involved in actin dynamics and plays an important role in neuromorphogenesis. Mutations in this gene result in recessive mental retardation-41. Alternatively spliced transcript variants have been found for this gene. kaptin (actin binding protein) NA
ENSG00000177051 FBXO46 23403 Members of the F-box protein family, such as FBXO46, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). F-box protein 46 NA
ENSG00000101405 OXT 5020 This gene encodes a precursor protein that is processed to produce oxytocin and neurophysin I. Oxytocin is a posterior pituitary hormone which is synthesized as an inactive precursor in the hypothalamus along with its carrier protein neurophysin I. Together with neurophysin, it is packaged into neurosecretory vesicles and transported axonally to the nerve endings in the neurohypophysis, where it is either stored or secreted into the bloodstream. The precursor seems to be activated while it is being transported along the axon to the posterior pituitary. This hormone contracts smooth muscle during parturition and lactation. It is also involved in cognition, tolerance, adaptation and complex sexual and maternal behaviour, as well as in the regulation of water excretion and cardiovascular functions. oxytocin/neurophysin I prepropeptide NA
ENSG00000270673 YTHDF3-AS1 101410533 NA YTHDF3 antisense RNA 1 (head to head) NA
ENSG00000266783 RP11-715F3.2 ENSG00000266783 NA NA NA
ENSG00000117586 TNFSF4 7292 This gene encodes a cytokine of the tumor necrosis factor (TNF) ligand family. The encoded protein functions in T cell antigen-presenting cell (APC) interactions and mediates adhesion of activated T cells to endothelial cells. Polymorphisms in this gene have been associated with Sjogren’s syndrome and systemic lupus erythematosus. Alternative splicing results in multiple transcript variants. tumor necrosis factor superfamily member 4 NA
ENSG00000272667 RP11-395A13.2 ENSG00000272667 NA NA NA
ENSG00000167543 TP53I13 90313 NA tumor protein p53 inducible protein 13 NA
ENSG00000267030 CTB-50L17.7 ENSG00000267030 NA NA NA
ENSG00000170430 MGMT 4255 Alkylating agents are potent carcinogens that can result in cell death, mutation and cancer. The protein encoded by this gene is a DNA repair protein that is involved in cellular defense against mutagenesis and toxicity from alkylating agents. The protein catalyzes transfer of methyl groups from O(6)-alkylguanine and other methylated moieties of the DNA to its own molecule, which repairs the toxic lesions. Methylation of the genes promoter has been associated with several cancer types, including colorectal cancer, lung cancer, lymphoma and glioblastoma. O-6-methylguanine-DNA methyltransferase NA
ENSG00000196476 C20orf96 140680 NA chromosome 20 open reading frame 96 NA
ENSG00000241073 RP4-714D9.2 ENSG00000241073 NA NA NA
ENSG00000166965 RCCD1 91433 NA RCC1 domain containing 1 NA
ENSG00000237015 CTA-984G1.5 ENSG00000237015 NA NA NA
ENSG00000213443 RP11-75L1.2 ENSG00000213443 NA NA NA
ENSG00000232677 LINC00665 100506930 NA long intergenic non-protein coding RNA 665 NA
ENSG00000259583 RP11-66B24.4 ENSG00000259583 NA NA NA
ENSG00000062282 DGAT2 84649 This gene encodes one of two enzymes which catalyzes the final reaction in the synthesis of triglycerides in which diacylglycerol is covalently bound to long chain fatty acyl-CoAs. The encoded protein catalyzes this reaction at low concentrations of magnesium chloride while the other enzyme has high activity at high concentrations of magnesium chloride. Multiple transcript variants encoding different isoforms have been found for this gene. diacylglycerol O-acyltransferase 2 NA
ENSG00000126453 BCL2L12 83596 This gene encodes a member of a family of proteins containing a Bcl-2 homology domain 2 (BH2). The encoded protein is an anti-apoptotic factor that acts as an inhibitor of caspases 3 and 7 in the cytoplasm. In the nucleus, it binds to the p53 tumor suppressor protein, preventing its association with target genes. Overexpression of this gene has been detected in a number of different cancers. There is a pseudogene for this gene on chromosome 3. Alternative splicing results in multiple transcript variants. BCL2 like 12 NA
ENSG00000106268 NUDT1 4521 Misincorporation of oxidized nucleoside triphosphates into DNA/RNA during replication and transcription can cause mutations that may result in carcinogenesis or neurodegeneration. The protein encoded by this gene is an enzyme that hydrolyzes oxidized purine nucleoside triphosphates, such as 8-oxo-dGTP, 8-oxo-dATP, 2-hydroxy-dATP, and 2-hydroxy rATP, to monophosphates, thereby preventing misincorporation. The encoded protein is localized mainly in the cytoplasm, with some in the mitochondria, suggesting that it is involved in the sanitization of nucleotide pools both for nuclear and mitochondrial genomes. Several alternatively spliced transcript variants, some of which encode distinct isoforms, have been identified. Additional variants have been observed, but their full-length natures have not been determined. A single-nucleotide polymorphism that results in the production of an additional, longer isoform (p26) has been described. nudix hydrolase 1 NA
ENSG00000155363 MOV10 4343 NA Mov10 RISC complex RNA helicase NA
ENSG00000105750 ZNF85 7639 NA zinc finger protein 85 NA
ENSG00000110011 DNAJC4 3338 NA DnaJ heat shock protein family (Hsp40) member C4 NA
ENSG00000144031 ANKRD53 79998 NA ankyrin repeat domain 53 NA
ENSG00000260018 RP11-505K9.1 ENSG00000260018 NA NA NA
ENSG00000261779 RP11-69H7.3 ENSG00000261779 NA NA NA
ENSG00000079462 PAFAH1B3 5050 This gene encodes an acetylhydrolase that catalyzes the removal of an acetyl group from the glycerol backbone of platelet-activating factor. The encoded enzyme is a subunit of the platelet-activating factor acetylhydrolase isoform 1B complex, which consists of the catalytic beta and gamma subunits and the regulatory alpha subunit. This complex functions in brain development. A translocation between this gene on chromosome 19 and the CDC-like kinase 2 gene on chromosome 1 has been observed, and was associated with mental retardation, ataxia, and atrophy of the brain. Alternatively spliced transcript variants have been described. platelet activating factor acetylhydrolase 1b catalytic subunit 3 NA
ENSG00000146066 HIGD2A 192286 NA HIG1 hypoxia inducible domain family member 2A NA
ENSG00000076248 UNG 7374 This gene encodes one of several uracil-DNA glycosylases. One important function of uracil-DNA glycosylases is to prevent mutagenesis by eliminating uracil from DNA molecules by cleaving the N-glycosylic bond and initiating the base-excision repair (BER) pathway. Uracil bases occur from cytosine deamination or misincorporation of dUMP residues. Alternative promoter usage and splicing of this gene leads to two different isoforms: the mitochondrial UNG1 and the nuclear UNG2. The UNG2 term was used as a previous symbol for the CCNO gene (GeneID 10309), which has been confused with this gene, in the literature and some databases. uracil DNA glycosylase NA
ENSG00000175854 SWI5 375757 NA SWI5 homologous recombination repair protein NA
ENSG00000114735 HEMK1 51409 NA HemK methyltransferase family member 1 NA
ENSG00000260136 CTD-2270L9.4 ENSG00000260136 NA NA NA
ENSG00000184465 WDR27 253769 This gene encodes a protein with multiple WD repeats. Proteins with these repeats may form scaffolds for protein-protein interaction and play key roles in cell signalling. Alternative splicing results in multiple transcript variants, but the full-length structure of some of these variants cannot be determined. WD repeat domain 27 NA
ENSG00000260517 RP11-426C22.5 ENSG00000260517 NA NA NA
ENSG00000221909 FAM200A 221786 This gene encodes a protein of unknown function. The protein is weakly similar to transposase-like proteins in human and mouse. family with sequence similarity 200 member A NA
ENSG00000236778 INTS6-AS1 ENSG00000236778 NA INTS6 antisense RNA 1 NA
ENSG00000166896 ATP23 91419 The protein encoded by this gene is amplified in glioblastomas and interacts with the DNA binding subunit of DNA-dependent protein kinase. This kinase is involved in double-strand break repair (DSB), and higher expression of the encoded protein increases the efficiency of DSB. In addition, comparison to orthologous proteins strongly suggests that this protein is a metalloprotease important in the biosynthesis of mitochondrial ATPase. Several transcript variants encoding different isoforms have been found for this gene. ATP23 metallopeptidase and ATP synthase assembly factor homolog (S. cerevisiae) NA
ENSG00000236015 AC011290.5 ENSG00000236015 NA NA NA
ENSG00000104983 CCDC61 729440 NA coiled-coil domain containing 61 NA
ENSG00000160318 CLDND2 125875 NA claudin domain containing 2 NA
ENSG00000224420 ADM5 199800 NA adrenomedullin 5 (putative) NA
ENSG00000267105 CTD-2369P2.4 ENSG00000267105 NA NA NA
ENSG00000253210 RP11-809O17.1 ENSG00000253210 NA NA NA
ENSG00000186665 C17orf58 284018 NA chromosome 17 open reading frame 58 NA
ENSG00000108107 RPL28 6158 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L28E family of ribosomal proteins. It is located in the cytoplasm. Variable expression of this gene in colorectal cancers compared to adjacent normal tissues has been observed, although no correlation between the level of expression and the severity of the disease has been found. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Alternative splicing results in multiple transcript variants encoding distinct isoforms. ribosomal protein L28 NA
ENSG00000224261 RPSAP18 ENSG00000224261 NA ribosomal protein SA pseudogene 18 NA
ENSG00000137404 NRM 11270 The protein encoded by this gene contains transmembrane domains and resides within the inner nuclear membrane, where it is tightly associated with the nucleus. This protein shares homology with isoprenylcysteine carboxymethyltransferase enzymes. Alternative splicing results in multiple transcript variants that encode different protein isoforms. nurim (nuclear envelope membrane protein) NA
ENSG00000221829 FANCG 2189 The Fanconi anemia complementation group (FANC) currently includes FANCA, FANCB, FANCC, FANCD1 (also called BRCA2), FANCD2, FANCE, FANCF, FANCG, FANCI, FANCJ (also called BRIP1), FANCL, FANCM and FANCN (also called PALB2). The previously defined group FANCH is the same as FANCA. Fanconi anemia is a genetically heterogeneous recessive disorder characterized by cytogenetic instability, hypersensitivity to DNA crosslinking agents, increased chromosomal breakage, and defective DNA repair. The members of the Fanconi anemia complementation group do not share sequence similarity; they are related by their assembly into a common nuclear protein complex. This gene encodes the protein for complementation group G. Fanconi anemia complementation group G NA
ENSG00000269560 CTD-2192J16.21 ENSG00000269560 NA NA NA
ENSG00000152147 GEMIN6 79833 GEMIN6 is part of a large macromolecular complex, localized to both the cytoplasm and the nucleus, that plays a role in the cytoplasmic assembly of small nuclear ribonucleoproteins (snRNPs). Other members of this complex include SMN (MIM 600354), GEMIN2 (SIP1; MIM 602595), GEMIN3 (DDX20; MIM 606168), GEMIN4 (MIM 606969), and GEMIN5 (MIM 607005). gem nuclear organelle associated protein 6 NA
ENSG00000243414 TICAM2 353376 TIRP is a Toll/interleukin-1 receptor (IL1R; MIM 147810) (TIR) domain-containing adaptor protein involved in Toll receptor signaling (see TLR4; MIM 603030). toll like receptor adaptor molecule 2 NA
ENSG00000232995 RGS5 8490 This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. regulator of G-protein signaling 5 NA
ENSG00000224066 RP4-622L5.7 ENSG00000224066 NA NA NA
ENSG00000268516 CTD-3138B18.5 ENSG00000268516 NA NA NA
ENSG00000187187 ZNF546 339327 NA zinc finger protein 546 NA
ENSG00000229539 RP11-119B16.2 ENSG00000229539 NA NA NA
ENSG00000233469 ST6GALNAC4P1 ENSG00000233469 NA ST6 (alpha-N-acetyl-neuraminyl-2,3-beta-galactosyl-1,3)-N-acetylgalactosaminide alpha-2,6-sialyltransferase 4 pseudogene 1 NA
ENSG00000273116 NA NA NA NA TRUE
ENSG00000148814 LRRC27 80313 NA leucine rich repeat containing 27 NA
ENSG00000205208 C4orf46 201725 This gene encodes a small, conserved protein of unknown function that is expressed in a variety of tissues. There are pseudogenes for this gene on chromosomes 6, 8, 16, and X. Alternative splicing results in multiple transcript variants. chromosome 4 open reading frame 46 NA
ENSG00000205464 ATP6AP1L 92270 NA ATPase H+ transporting accessory protein 1 like NA
ENSG00000267575 LOC101927151 101927151 NA uncharacterized LOC101927151 NA
ENSG00000224956 NA NA NA NA TRUE
ENSG00000197568 HHLA3 11147 NA HERV-H LTR-associating 3 NA
ENSG00000267751 AC009005.2 ENSG00000267751 NA NA NA
ENSG00000083814 ZNF671 79891 NA zinc finger protein 671 NA
ENSG00000188243 COMMD6 170622 COMMD6 belongs to a family of NF-kappa-B (see RELA; MIM 164014)-inhibiting proteins characterized by the presence of a COMM domain (see COMMD1; MIM 607238) (de Bie et al., 2006 [PubMed 16573520]). COMM domain containing 6 NA
ENSG00000204519 ZNF551 90233 NA zinc finger protein 551 NA
ENSG00000164241 C5orf63 401207 NA chromosome 5 open reading frame 63 NA
ENSG00000245614 DDX11-AS1 100506660 NA DDX11 antisense RNA 1 NA
ENSG00000134253 TRIM45 80263 This gene encodes a member of the tripartite motif family. The encoded protein may function as a transcriptional repressor of the mitogen-activated protein kinase pathway. Alternatively spliced transcript variants have been described. tripartite motif containing 45 NA
ENSG00000188878 FBF1 ENSG00000188878 NA Fas (TNFRSF6) binding factor 1 NA
ENSG00000138399 FASTKD1 79675 NA FAST kinase domains 1 NA
ENSG00000204410 MSH5 4439 This gene encodes a member of the mutS family of proteins that are involved in DNA mismatch repair and meiotic recombination. This protein is similar to a Saccharomyces cerevisiae protein that participates in segregation fidelity and crossing-over events during meiosis. This protein plays a role in promoting ionizing radiation-induced apoptosis. This protein forms hetero-oligomers with another member of this family, mutS homolog 4. Polymorphisms in this gene have been linked to various human diseases, including IgA deficiency, common variable immunodeficiency, and premature ovarian failure. Alternative splicing results multiple transcript variants. Read-through transcription also exists between this gene and the downstream chromosome 6 open reading frame 26 (C6orf26) gene. mutS homolog 5 NA
ENSG00000245261 RP3-330M21.5 ENSG00000245261 NA NA NA
ENSG00000171163 ZNF692 55657 NA zinc finger protein 692 NA
ENSG00000169964 TMEM42 131616 NA transmembrane protein 42 NA
ENSG00000131378 RFTN1 23180 NA raftlin, lipid raft linker 1 NA
ENSG00000106477 CEP41 95681 This gene encodes a centrosomal and microtubule-binding protein which is predicted to have two coiled-coil domains and a rhodanese domain. In human retinal pigment epithelial cells the protein localized to centrioles and cilia. Mutations in this gene have been associated with Joubert Syndrome 15; an autosomal recessive ciliopathy and neurological disorder. Alternative splicing results in multiple transcript variants. centrosomal protein 41 NA
ENSG00000156787 TBC1D31 93594 NA TBC1 domain family member 31 NA
ENSG00000259901 NA NA NA NA TRUE
ENSG00000233184 RP11-421L21.3 ENSG00000233184 NA NA NA
ENSG00000105173 CCNE1 898 The protein encoded by this gene belongs to the highly conserved cyclin family, whose members are characterized by a dramatic periodicity in protein abundance through the cell cycle. Cyclins function as regulators of CDK kinases. Different cyclins exhibit distinct expression and degradation patterns which contribute to the temporal coordination of each mitotic event. This cyclin forms a complex with and functions as a regulatory subunit of CDK2, whose activity is required for cell cycle G1/S transition. This protein accumulates at the G1-S phase boundary and is degraded as cells progress through S phase. Overexpression of this gene has been observed in many tumors, which results in chromosome instability, and thus may contribute to tumorigenesis. This protein was found to associate with, and be involved in, the phosphorylation of NPAT protein (nuclear protein mapped to the ATM locus), which participates in cell-cycle regulated histone gene expression and plays a critical role in promoting cell-cycle progression in the absence of pRB. cyclin E1 NA
ENSG00000260368 RP11-521I2.3 ENSG00000260368 NA NA NA
ENSG00000167081 PBX3 5090 NA PBX homeobox 3 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",15,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 16 Annotations

out <- mygene::queryMany(gene_list[16,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary query name X_id symbol notfound
Proteins encoded by the complexin/synaphin gene family are cytosolic proteins that function in synaptic vesicle exocytosis. These proteins bind syntaxin, part of the SNAP receptor. The protein product of this gene binds to the SNAP receptor complex and disrupts it, allowing transmitter release. ENSG00000168993 complexin 1 10815 CPLX1 NA
The protein encoded by this gene is involved in the attachment of osteoclasts to the mineralized bone matrix. The encoded protein is secreted and binds hydroxyapatite with high affinity. The osteoclast vitronectin receptor is found in the cell membrane and may be involved in the binding to this protein. This protein is also a cytokine that upregulates expression of interferon-gamma and interleukin-12. Several transcript variants encoding different isoforms have been found for this gene. ENSG00000118785 secreted phosphoprotein 1 6696 SPP1 NA
This gene encodes a member of the paralemmin protein family. The product of this gene is a prenylated and palmitoylated phosphoprotein that associates with the cytoplasmic face of plasma membranes and is implicated in plasma membrane dynamics in neurons and other cell types. Several alternatively spliced transcript variants have been identified, but the full-length nature of only two transcript variants has been determined. ENSG00000099864 paralemmin 5064 PALM NA
The protein encoded by this gene may play a role in the attachment of stem cells to the bone marrow extracellular matrix or to stromal cells. This single-pass membrane protein is highly glycosylated and phosphorylated by protein kinase C. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000174059 CD34 molecule 947 CD34 NA
This gene belongs to the family of reticulon encoding genes. Reticulons are associated with the endoplasmic reticulum, and are involved in neuroendocrine secretion or in membrane trafficking in neuroendocrine cells. This gene is considered to be a specific marker for neurological diseases and cancer, and is a potential molecular target for therapy. Alternative splicing results in multiple transcript variants. ENSG00000139970 reticulon 1 6252 RTN1 NA
The protein encoded by this gene is similar to insulin in function and structure and is a member of a family of proteins involved in mediating growth and development. The encoded protein is processed from a precursor, bound by a specific receptor, and secreted. Defects in this gene are a cause of insulin-like growth factor I deficiency. Alternative splicing results in multiple transcript variants encoding different isoforms that may undergo similar processing to generate mature protein. ENSG00000017427 insulin like growth factor 1 3479 IGF1 NA
This gene encodes a protein with an arfaptin homology domain that is found both in the cytosol and as membrane-bound form on the Golgi complex and immature secretory granules. This protein is believed to be an autoantigen in insulin-dependent diabetes mellitus and primary Sjogren’s syndrome. Several transcript variants encoding two different isoforms have been found for this gene. ENSG00000003147 islet cell autoantigen 1 3382 ICA1 NA
cAMP is a signaling molecule important for a variety of cellular functions. cAMP exerts its effects by activating the cAMP-dependent protein kinase, which transduces the signal through phosphorylation of different target proteins. The inactive kinase holoenzyme is a tetramer composed of two regulatory and two catalytic subunits. cAMP causes the dissociation of the inactive holoenzyme into a dimer of regulatory subunits bound to four cAMP and two free monomeric catalytic subunits. Four different regulatory subunits and three catalytic subunits have been identified in humans. The protein encoded by this gene is one of the regulatory subunits. This subunit can be phosphorylated by the activated catalytic subunit. This subunit has been shown to interact with and suppress the transcriptional activity of the cAMP responsive element binding protein 1 (CREB1) in activated T cells. Knockout studies in mice suggest that this subunit may play an important role in regulating energy balance and adiposity. The studies also suggest that this subunit may mediate the gene induction and cataleptic behavior induced by haloperidol. ENSG00000005249 protein kinase cAMP-dependent type II regulatory subunit beta 5577 PRKAR2B NA
This gene encodes a multi-domain secreted protein that may have a critical role in ocular and limb development. Mutations in this gene are associated with microphthalmia and limb anomalies. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000198732 SPARC related modular calcium binding 1 64093 SMOC1 NA
The protein encoded by this gene is a member of the RAMP family of single-transmembrane-domain proteins, called receptor (calcitonin) activity modifying proteins (RAMPs). RAMPs are type I transmembrane proteins with an extracellular N terminus and a cytoplasmic C terminus. RAMPs are required to transport calcitonin-receptor-like receptor (CRLR) to the plasma membrane. CRLR, a receptor with seven transmembrane domains, can function as either a calcitonin-gene-related peptide (CGRP) receptor or an adrenomedullin receptor, depending on which members of the RAMP family are expressed. In the presence of this (RAMP2) protein, CRLR functions as an adrenomedullin receptor. The RAMP2 protein is involved in core glycosylation and transportation of adrenomedullin receptor to the cell surface. ENSG00000131477 receptor activity modifying protein 2 10266 RAMP2 NA
NA ENSG00000271738 NA NA NA TRUE
NA ENSG00000130300 plasmalemma vesicle associated protein 83483 PLVAP NA
NA ENSG00000186994 KN motif and ankyrin repeat domains 3 256949 KANK3 NA
The protein encoded by this gene is a member of the dual specificity protein phosphatase subfamily. These phosphatases inactivate their target kinases by dephosphorylating both the phosphoserine/threonine and phosphotyrosine residues. They negatively regulate members of the mitogen-activated protein (MAP) kinase superfamily (MAPK/ERK, SAPK/JNK, p38), which is associated with cellular proliferation and differentiation. Different members of the family of dual specificity phosphatases show distinct substrate specificities for various MAP kinases, different tissue distribution and subcellular localization, and different modes of inducibility of their expression by extracellular stimuli. This gene product inactivates SAPK/JNK and p38, is expressed predominantly in the adult brain, heart, and skeletal muscle, is localized in the cytoplasm, and is induced by nerve growth factor and insulin. An intronless pseudogene for DUSP8 is present on chromosome 10q11.2. ENSG00000184545 dual specificity phosphatase 8 1850 DUSP8 NA
NA ENSG00000197291 RAMP2 antisense RNA 1 100190938 RAMP2-AS1 NA
NA ENSG00000122378 family with sequence similarity 213 member A 84293 FAM213A NA
This gene encodes cytosolic alanine aminotransaminase 1 (ALT1); also known as glutamate-pyruvate transaminase 1. This enzyme catalyzes the reversible transamination between alanine and 2-oxoglutarate to generate pyruvate and glutamate and, therefore, plays a key role in the intermediary metabolism of glucose and amino acids. Serum activity levels of this enzyme are routinely used as a biomarker of liver injury caused by drug toxicity, infection, alcohol, and steatosis. A related gene on chromosome 16 encodes a putative mitochondrial alanine aminotransaminase. ENSG00000167701 glutamic-pyruvate transaminase (alanine aminotransferase) 2875 GPT NA
NA ENSG00000226009 KCNIP2 antisense RNA 1 ENSG00000226009 KCNIP2-AS1 NA
NA ENSG00000135447 protein phosphatase 1 regulatory inhibitor subunit 1A 5502 PPP1R1A NA
The protein encoded by this gene is a Golgi stack membrane protein that is involved in the creation of a precursor of the H antigen, which is required for the final step in the soluble A and B antigen synthesis pathway. This gene is one of two encoding the galactoside 2-L-fucosyltransferase enzyme. Mutations in this gene are a cause of the H-Bombay blood group. ENSG00000174951 fucosyltransferase 1 (H blood group) 2523 FUT1 NA
NA ENSG00000260912 NA ENSG00000260912 RP11-363E7.4 NA
The protein encoded by this gene coats lipid storage droplets in adipocytes, thereby protecting them until they can be broken down by hormone-sensitive lipase. The encoded protein is the major cAMP-dependent protein kinase substrate in adipocytes and, when unphosphorylated, may play a role in the inhibition of lipolysis. Alternatively spliced transcript variants varying in the 5’ UTR, but encoding the same protein, have been found for this gene. ENSG00000166819 perilipin 1 5346 PLIN1 NA
This gene encodes a protein belonging to the member of elastin microfibril interface-located (EMILIN) protein family. This family member is an extracellular matrix glycoprotein that can interfere with tumor angiogenesis and growth. It serves as a transforming growth factor beta antagonist and can interfere with the VEGF-A/VEGFR2 pathway. A related pseudogene has been identified on chromosome 6. ENSG00000173269 multimerin 2 79812 MMRN2 NA
This gene encodes a member of the NOTCH family of proteins. Members of this Type I transmembrane protein family share structural characteristics including an extracellular domain consisting of multiple epidermal growth factor-like (EGF) repeats, and an intracellular domain consisting of multiple different domain types. Notch signaling is an evolutionarily conserved intercellular signaling pathway that regulates interactions between physically adjacent cells through binding of Notch family receptors to their cognate ligands. The encoded preproprotein is proteolytically processed in the trans-Golgi network to generate two polypeptide chains that heterodimerize to form the mature cell-surface receptor. This receptor may play a role in vascular, renal and hepatic development. Mutations in this gene may be associated with schizophrenia. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. ENSG00000204301 notch 4 4855 NOTCH4 NA
This gene encodes a member of the SLC29A/ENT transporter protein family. The encoded membrane protein catalyzes the reuptake of monoamines into presynaptic neurons, thus determining the intensity and duration of monoamine neural signaling. It has been shown to transport several compounds, including serotonin, dopamine, and the neurotoxin 1-methyl-4-phenylpyridinium. Alternative splicing results in multiple transcript variants. ENSG00000164638 solute carrier family 29 member 4 222962 SLC29A4 NA
This gene encodes a type I membrane glycoprotein containing two extracellular immunoglobulin domains, a transmembrane and a cytoplasmic domain. This gene is expressed by various cell types, including B cells, a subset of T cells, thymocytes, endothelial cells, and neurons. The encoded protein plays an important role in immunosuppression and regulation of anti-tumor activity. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000091972 CD200 molecule 4345 CD200 NA
This gene encodes a member of the FXYD family of transmembrane proteins. This particular protein encodes phosphohippolin, which likely affects the activity of Na,K-ATPase. Multiple alternatively spliced transcript variants encoding the same protein have been described. Related pseudogenes have been identified on chromosomes 10 and X. Read-through transcripts have been observed between this locus and the downstream sodium/potassium-transporting ATPase subunit gamma (FXYD2, GeneID 486) locus. ENSG00000137726 FXYD domain containing ion transport regulator 6 53826 FXYD6 NA
NA ENSG00000254528 NA ENSG00000254528 RP11-728F11.4 NA
NA ENSG00000182118 family with sequence similarity 89 member A 375061 FAM89A NA
The beta-adrenergic receptor kinase specifically phosphorylates the agonist-occupied form of the beta-adrenergic and related G protein-coupled receptors. Overall, the beta adrenergic receptor kinase 2 has 85% amino acid similarity with beta adrenergic receptor kinase 1, with the protein kinase catalytic domain having 95% similarity. These data suggest the existence of a family of receptor kinases which may serve broadly to regulate receptor function. ENSG00000100077 G protein-coupled receptor kinase 3 157 GRK3 NA
NA ENSG00000222328 RNA, U2 small nuclear 2, pseudogene ENSG00000222328 RNU2-2P NA
This gene encodes a member of the family of voltage-gated potassium (Kv) channel-interacting proteins (KCNIPs), which belongs to the recoverin branch of the EF-hand superfamily. Members of the KCNIP family are small calcium binding proteins. They all have EF-hand-like domains, and differ from each other in the N-terminus. They are integral subunit components of native Kv4 channel complexes. They may regulate A-type currents, and hence neuronal excitability, in response to changes in intracellular calcium. Multiple alternatively spliced transcript variants encoding distinct isoforms have been identified from this gene. ENSG00000120049 potassium voltage-gated channel interacting protein 2 30819 KCNIP2 NA
The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The catalytic subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes an alpha 2 subunit. Mutations in this gene result in familial basilar or hemiplegic migraines, and in a rare syndrome known as alternating hemiplegia of childhood. ENSG00000018625 ATPase Na+/K+ transporting subunit alpha 2 477 ATP1A2 NA
The Notch signaling pathway is an intercellular signaling mechanism that is essential for proper embryonic development. Members of the Notch gene family encode transmembrane receptors that are critical for various cell fate decisions. The protein encoded by this gene is one of several ligands that activate Notch and related receptors. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000184916 jagged 2 3714 JAG2 NA
The protein encoded by this gene is a member of the dual specificity protein phosphatase subfamily. These phosphatases inactivate their target kinases by dephosphorylating both the phosphoserine/threonine and phosphotyrosine residues. They negatively regulate members of the mitogen-activated protein (MAP) kinase superfamily (MAPK/ERK, SAPK/JNK, p38), which are associated with cellular proliferation and differentiation. Different members of the family of dual specificity phosphatases show distinct substrate specificities for various MAP kinases, different tissue distribution and subcellular localization, and different modes of inducibility of their expression by extracellular stimuli. This gene product inactivates ERK1, ERK2 and JNK, is expressed in a variety of tissues, and is localized in the nucleus. Two alternatively spliced transcript variants, encoding distinct isoforms, have been observed for this gene. In addition, multiple polyadenylation sites have been reported. ENSG00000120875 dual specificity phosphatase 4 1846 DUSP4 NA
NA ENSG00000164849 G protein-coupled receptor 146 115330 GPR146 NA
NA ENSG00000177685 calcium release activated channel regulator 2B 283229 CRACR2B NA
NA ENSG00000257607 NA ENSG00000257607 RP11-449P15.1 NA
This gene encodes a protein which contains a C-terminal domain able to interact with the angiotension II (AT2) receptor and a large coiled-coil region allowing dimerization. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. One of the transcript variants has been shown to encode a mitochondrial protein that acts as a tumor suppressor and partcipates in AT2 signaling pathways. Other variants may encode nuclear or transmembrane proteins but it has not been determined whether they also participate in AT2 signaling pathways. ENSG00000129422 microtubule associated tumor suppressor 1 57509 MTUS1 NA
The gene is part of a 3-member transmembrane receptor kinase receptor family with a processed pseudogene distal on chromosome 15. The encoded protein is activated by the products of the growth arrest-specific gene 6 and protein S genes and is involved in controlling cell survival and proliferation, spermatogenesis, immunoregulation and phagocytosis. The encoded protein has also been identified as a cell entry factor for Ebola and Marburg viruses. ENSG00000092445 TYRO3 protein tyrosine kinase 7301 TYRO3 NA
Members of the perilipin family, such as PLIN4, coat intracellular lipid storage droplets (Wolins et al., 2003 [PubMed 12840023]). ENSG00000167676 perilipin 4 729359 PLIN4 NA
NA ENSG00000258603 NA ENSG00000258603 RP3-414A15.10 NA
NA ENSG00000239911 PRKAG2 antisense RNA 1 ENSG00000239911 PRKAG2-AS1 NA
This gene is a member of the calcium/calmodulin-dependent protein kinase 1 family, a subfamily of the serine/threonine kinases. The encoded protein is a component of the calcium-regulated calmodulin-dependent protein kinase cascade. It has been associated with multiple processes including regulation of granulocyte function, activation of CREB-dependent gene transcription, aldosterone synthesis, differentiation and activation of neutrophil cells, and apoptosis of erythroleukemia cells. Alternatively spliced transcript variants encoding different isoforms of this gene have been described. ENSG00000183049 calcium/calmodulin dependent protein kinase ID 57118 CAMK1D NA
NA ENSG00000163053 solute carrier family 16 member 14 151473 SLC16A14 NA
The protein encoded by this gene is a member of the L1 gene family of neural cell adhesion molecules. It is a neural recognition molecule that may be involved in signal transduction pathways. The deletion of one copy of this gene may be responsible for mental defects in patients with 3p- syndrome. This protein may also play a role in the growth of certain cancers. Alternate splicing results in both coding and non-coding variants. ENSG00000134121 cell adhesion molecule L1 like 10752 CHL1 NA
NA ENSG00000267992 NA ENSG00000267992 CTB-189B5.3 NA
NIPSNAP3B belongs to a family of proteins with putative roles in vesicular trafficking (Buechler et al., 2004 [PubMed 15177564]). ENSG00000165028 nipsnap homolog 3B 55335 NIPSNAP3B NA
Due to its chemical instability and low solubility in aqueous solution, vitamin A requires cellular retinol-binding proteins (CRBPs), such as RBP7, for stability, internalization, intercellular transfer, homeostasis, and metabolism. ENSG00000162444 retinol binding protein 7 116362 RBP7 NA
The protein encoded by this gene is an adenosine receptor that belongs to the G-protein coupled receptor 1 family. There are 3 types of adenosine receptors, each with a specific pattern of ligand binding and tissue distribution, and together they regulate a diverse set of physiologic functions. The type A1 receptors inhibit adenylyl cyclase, and play a role in the fertilization process. Animal studies also suggest a role for A1 receptors in kidney function and ethanol intoxication. Transcript variants with alternative splicing in the 5’ UTR have been found for this gene. ENSG00000163485 adenosine A1 receptor 134 ADORA1 NA
The protein encoded by this gene is a major apoprotein of the chylomicron. It binds to a specific liver and peripheral cell receptor, and is essential for the normal catabolism of triglyceride-rich lipoprotein constituents. This gene maps to chromosome 19 in a cluster with the related apolipoprotein C1 and C2 genes. Mutations in this gene result in familial dysbetalipoproteinemia, or type III hyperlipoproteinemia (HLP III), in which increased plasma cholesterol and triglycerides are the consequence of impaired clearance of chylomicron and VLDL remnants. Alternative splicing results in multiple transcript variants. ENSG00000130203 apolipoprotein E 348 APOE NA
The protein encoded by this gene has a long and a short form, generated by use of alternative translational start codons. The long form is expressed in steroidogenic tissues such as testis, where it converts cholesteryl esters to free cholesterol for steroid hormone production. The short form is expressed in adipose tissue, among others, where it hydrolyzes stored triglycerides to free fatty acids. ENSG00000079435 lipase E, hormone sensitive type 3991 LIPE NA
NA ENSG00000203685 stum, mechanosensory transduction mediator homolog 375057 STUM NA
The protein encoded by this gene is a member of a G protein subfamily that mediates signal transduction in pertussis toxin-insensitive systms. This encoded protein may play a role in maintaining the ionic balance of perilymphatic and endolymphatic cochlear fluids. ENSG00000128266 G protein subunit alpha z 2781 GNAZ NA
Members of the F-box protein family, such as FBXO27, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). ENSG00000161243 F-box protein 27 126433 FBXO27 NA
NA ENSG00000229299 NA ENSG00000229299 RP4-583P15.10 NA
This gene encodes an adaptor protein and member of a cytoplasmic protein family involved in cell migration. The encoded protein contains a putative Src homology 2 (SH2) domain and guanine nucleotide exchange factor-like domain which allows this signaling protein to form a complex with scaffolding protein Crk-associated substrate. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000095370 SH2 domain containing 3C 10044 SH2D3C NA
The protein encoded by this gene belongs to the integrin alpha chain family. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain. They mediate a wide spectrum of cell-cell and cell-matrix interactions, and thus play a role in cell migration, morphologic development, differentiation, and metastasis. This protein functions as a receptor for the basement membrane protein laminin-1. It is mainly expressed in skeletal and cardiac muscles and may be involved in differentiation and migration processes during myogenesis. Defects in this gene are associated with congenital myopathy. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. ENSG00000135424 integrin subunit alpha 7 3679 ITGA7 NA
Synaptic vesicle membrane docking and fusion is mediated by SNAREs (soluble N-ethylmaleimide-sensitive factor attachment protein receptors) located on the vesicle membrane (v-SNAREs) and the target membrane (t-SNAREs). The assembled v-SNARE/t-SNARE complex consists of a bundle of four helices, one of which is supplied by v-SNARE and the other three by t-SNARE. For t-SNAREs on the plasma membrane, the protein syntaxin supplies one helix and the protein encoded by this gene contributes the other two. Therefore, this gene product is a presynaptic plasma membrane protein involved in the regulation of neurotransmitter release. Two alternative transcript variants encoding different protein isoforms have been described for this gene. ENSG00000132639 synaptosome associated protein 25 6616 SNAP25 NA
This gene encodes a member of the FAM69 family of cysteine-rich type II transmembrane proteins. These proteins localize to the endoplasmic reticulum but their specific functions are unknown. ENSG00000165716 family with sequence similarity 69 member B 138311 FAM69B NA
NA ENSG00000239218 ribosomal protein S20 pseudogene 22 ENSG00000239218 RPS20P22 NA
The protein encoded by this gene is a member of the intercellular adhesion molecule (ICAM) family. All ICAM proteins are type I transmembrane glycoproteins, contain 2-9 immunoglobulin-like C2-type domains, and bind to the leukocyte adhesion LFA-1 protein. This protein may play a role in lymphocyte recirculation by blocking LFA-1-dependent cell adhesion. It mediates adhesive interactions important for antigen-specific immune response, NK-cell mediated clearance, lymphocyte recirculation, and other cellular interactions important for immune response and surveillance. Several transcript variants encoding the same protein have been found for this gene. ENSG00000108622 intercellular adhesion molecule 2 3384 ICAM2 NA
NA ENSG00000214578 high mobility group nucleosomal binding domain 2 pseudogene 15 ENSG00000214578 HMGN2P15 NA
NA ENSG00000205959 NA ENSG00000205959 RP11-689P11.2 NA
This gene encodes a serine/threonine protein kinase. Although this gene product is similar to serum- and glucocorticoid-induced protein kinase (SGK), this gene is not induced by serum or glucocorticoids. This gene is induced in response to signals that activate phosphatidylinositol 3-kinase, which is also true for SGK. Alternative splicing results in multiple transcript variants. ENSG00000101049 SGK2, serine/threonine kinase 2 10110 SGK2 NA
The sphingolipid metabolite sphingosine-1-phosphate promotes cell proliferation and survival, whereas its precursor, sphingosine, has the opposite effect. The ceramidase ACER2 hydrolyzes very long chain ceramides to generate sphingosine (Xu et al., 2006 [PubMed 16940153]). ENSG00000177076 alkaline ceramidase 2 340485 ACER2 NA
NA ENSG00000176485 phospholipase A2 group XVI 11145 PLA2G16 NA
NA ENSG00000256604 NA NA NA TRUE
NA ENSG00000272678 NA ENSG00000272678 RP11-797D24.4 NA
NA ENSG00000257622 NA ENSG00000257622 RP11-44N21.4 NA
FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. ENSG00000170323 fatty acid binding protein 4 2167 FABP4 NA
This gene encodes a protein belonging to the GTP-binding superfamily and to the immuno-associated nucleotide (IAN) subfamily of nucleotide-binding proteins. In humans, the IAN subfamily genes are located in a cluster at 7q36.1. This gene encodes an antiapoptotic protein that functions in T-cell survival. Polymorphisms in this gene are associated with systemic lupus erythematosus. Read-through transcription exists between this gene and the neighboring upstream GIMAP1 (GTPase, IMAP family member 1) gene. ENSG00000196329 GTPase, IMAP family member 5 55340 GIMAP5 NA
This gene likely encodes a member of the carboxypeptidase family of proteins. Cloning of a comparable locus in mouse indicates that the encoded protein contains a discoidin domain and a carboxypeptidase domain, but the protein appears to lack residues necessary for carboxypeptidase activity. ENSG00000088882 carboxypeptidase X (M14 family), member 1 56265 CPXM1 NA
NA ENSG00000256661 A2ML1 antisense RNA 1 ENSG00000256661 A2ML1-AS1 NA
Acetyl-CoA carboxylase (ACC) is a complex multifunctional enzyme system. ACC is a biotin-containing enzyme which catalyzes the carboxylation of acetyl-CoA to malonyl-CoA, the rate-limiting step in fatty acid synthesis. ACC-beta is thought to control fatty acid oxidation by means of the ability of malonyl-CoA to inhibit carnitine-palmitoyl-CoA transferase I, the rate-limiting step in fatty acid uptake and oxidation by mitochondria. ACC-beta may be involved in the regulation of fatty acid oxidation, rather than fatty acid biosynthesis. There is evidence for the presence of two ACC-beta isoforms. ENSG00000076555 acetyl-CoA carboxylase beta 32 ACACB NA
This gene encodes a nuclear protein belonging to the hairy and enhancer of split-related (HESR) family of basic helix-loop-helix (bHLH)-type transcriptional repressors. Expression of this gene is induced by the Notch and c-Jun signal transduction pathways. Two similar and redundant genes in mouse are required for embryonic cardiovascular development, and are also implicated in neurogenesis and somitogenesis. Alternative splicing results in multiple transcript variants. ENSG00000164683 hes related family bHLH transcription factor with YRPW motif 1 23462 HEY1 NA
NA ENSG00000225792 NA ENSG00000225792 AC004540.4 NA
NA ENSG00000139597 NEDD4 binding protein 2-like 1 90634 N4BP2L1 NA
NA ENSG00000268358 NA NA NA TRUE
NA ENSG00000256633 NA ENSG00000256633 RP11-169D4.2 NA
This gene encodes a protein containing several protein-protein interaction domains, including ankyrin-like repeats, a coiled-coil domain, and an ATP/GTP-binding motif. The encoded protein interacts with alpha-synuclein in neuronal tissue and may play a role in the formation of cytoplasmic inclusions and neurodegeneration. A mutation in this gene has been associated with Parkinson’s disease. Alternative splicing results in multiple transcript variants. ENSG00000064692 synuclein alpha interacting protein 9627 SNCAIP NA
NA ENSG00000105808 uncharacterized LOC102724229 102724229 LOC102724229 NA
This gene encodes a member of the GAP1 family of GTPase-activating proteins that suppresses the Ras/mitogen-activated protein kinase pathway in response to Ca(2+). Stimuli that increase intracellular Ca(2+) levels result in the translocation of this protein to the plasma membrane, where it activates Ras GTPase activity. Consequently, Ras is converted from the active GTP-bound state to the inactive GDP-bound state and no longer activates downstream pathways that regulate gene expression, cell growth, and differentiation. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000105808 RAS p21 protein activator 4 10156 RASA4 NA
The protein encoded by this gene is a member of the G-protein coupled receptor family 2. This protein is a receptor for parathyroid hormone (PTH) and for parathyroid hormone-like hormone (PTHLH). The activity of this receptor is mediated by G proteins which activate adenylyl cyclase and also a phosphatidylinositol-calcium second messenger system. Defects in this receptor are known to be the cause of Jansen’s metaphyseal chondrodysplasia (JMC), chondrodysplasia Blomstrand type (BOCD), as well as enchodromatosis. Two transcript variants encoding the same protein have been found for this gene. ENSG00000160801 parathyroid hormone 1 receptor 5745 PTH1R NA
The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intracellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the ABC1 subfamily. Members of the ABC1 subfamily comprise the only major ABC subfamily found exclusively in multicellular eukaryotes. The full transporter encoded by this gene may be involved in development of resistance to xenobiotics and engulfment during programmed cell death. ENSG00000167972 ATP binding cassette subfamily A member 3 21 ABCA3 NA
In the mouse, Nkd is a Dishevelled (see DVL1; MIM 601365)-binding protein that functions as a negative regulator of the Wnt (see WNT1; MIM 164820)-beta-catenin (see MIM 116806)-Tcf (see MIM 602272) signaling pathway. ENSG00000140807 naked cuticle homolog 1 85407 NKD1 NA
NA ENSG00000237248 long intergenic non-protein coding RNA 987 100499405 LINC00987 NA
This gene encodes a member of the bombesin-like family of neuropeptides, which negatively regulate eating behavior. The encoded protein may regulate colonic smooth muscle contraction through binding to its cognate receptor, the neuromedin B receptor (NMBR). Polymorphisms of this gene may be associated with hunger, weight gain and obesity. Alternative splicing results in multiple transcript variants. ENSG00000197696 neuromedin B 4828 NMB NA
This gene encodes a transcription factor that is a member of the nuclear receptor subfamily 1. The encoded protein is a ligand-sensitive transcription factor that negatively regulates the expression of core clock proteins. In particular this protein represses the circadian clock transcription factor aryl hydrocarbon receptor nuclear translocator-like protein 1 (ARNTL). This protein may also be involved in regulating genes that function in metabolic, inflammatory and cardiovascular processes. ENSG00000126368 nuclear receptor subfamily 1 group D member 1 9572 NR1D1 NA
The protein encoded by this gene is a member of the type 3 G protein-coupled receptor family. Members of this superfamily are characterized by a signature 7-transmembrane domain motif. The specific function of this protein is unknown; however, this protein may mediate the cellular effects of retinoic acid on the G protein signal transduction cascade. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000170412 G protein-coupled receptor class C group 5 member C 55890 GPRC5C NA
APM2 gene is exclusively expressed in adipose tissue. Its function is currently unknown. ENSG00000148671 adipogenesis regulatory factor 10974 ADIRF NA
Members of the perilipin family, such as PLIN5, coat intracellular lipid storage droplets and protect them from lipolytic degradation (Dalen et al., 2007 [PubMed 17234449]). ENSG00000214456 perilipin 5 440503 PLIN5 NA
NA ENSG00000117461 phosphoinositide-3-kinase regulatory subunit 3 8503 PIK3R3 NA
The protein encoded by this gene belongs to the cyclic nucleotide phosphodiesterase (PDE) family, and PDE1 subfamily. Members of the PDE1 family are calmodulin-dependent PDEs that are stimulated by a calcium-calmodulin complex. This PDE has dual-specificity for the second messengers, cAMP and cGMP, with a preference for cGMP as a substrate. cAMP and cGMP function as key regulators of many important physiological processes. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. ENSG00000123360 phosphodiesterase 1B 5153 PDE1B NA
The protein encoded by this gene belongs to the dpy-19 family. It is highly expressed in testis, and is required for sperm head elongation and acrosome formation during spermatogenesis. Mutations in this gene are associated with an infertility disorder, spermatogenic failure type 9 (SPGF9). ENSG00000177990 dpy-19 like 2 283417 DPY19L2 NA
The transmembrane semaphorin SEMA6A is expressed in developing neural tissue and is required for proper development of the thalamocortical projection (Leighton et al., 2001 [PubMed 11242070]). ENSG00000092421 semaphorin 6A 57556 SEMA6A NA
The protein encoded by this gene contains six PDZ domains and shares sequence similarity with pro-interleukin-16 (pro-IL-16). Like pro-IL-16, the encoded protein localizes to the endoplasmic reticulum and is thought to be cleaved by a caspase to produce a secreted peptide containing two PDZ domains. In addition, this gene is upregulated in primary prostate tumors and may be involved in the early stages of prostate tumorigenesis. ENSG00000133401 PDZ domain containing 2 23037 PDZD2 NA
This gene encodes a member of the ankyrin family of proteins that link the integral membrane proteins to the underlying spectrin-actin cytoskeleton. Ankyrins play key roles in activities such as cell motility, activation, proliferation, contact and the maintenance of specialized membrane domains. Most ankyrins are typically composed of three structural domains: an amino-terminal domain containing multiple ankyrin repeats; a central region with a highly conserved spectrin binding domain; and a carboxy-terminal regulatory domain which is the least conserved and subject to variation. The protein encoded by this gene is required for targeting and stability of Na/Ca exchanger 1 in cardiomyocytes. Mutations in this gene cause long QT syndrome 4 and cardiac arrhythmia syndrome. Multiple transcript variants encoding different isoforms have been described. ENSG00000145362 ankyrin 2, neuronal 287 ANK2 NA
NA ENSG00000156750 NA NA NA TRUE
This gene encodes a member of the regulator of calcineurin (RCAN) protein family. These proteins play a role in many physiological processes by binding to the catalytic domain of calcineurin A, inhibiting calcineurin-mediated nuclear translocation of the transcription factor NFATC1. Expression of this gene in skin fibroblasts is upregulated by thyroid hormone, and the encoded protein may also play a role in endothelial cell function and angiogenesis. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000172348 regulator of calcineurin 2 10231 RCAN2 NA
Spectrin is an actin crosslinking and molecular scaffold protein that links the plasma membrane to the actin cytoskeleton, and functions in the determination of cell shape, arrangement of transmembrane proteins, and organization of organelles. It is composed of two antiparallel dimers of alpha- and beta- subunits. This gene is one member of a family of beta-spectrin genes. The encoded protein localizes to the nuclear matrix, PML nuclear bodies, and cytoplasmic vesicles. A highly similar gene in the mouse is required for localization of specific membrane proteins in polarized regions of neurons. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000160460 spectrin beta, non-erythrocytic 4 57731 SPTBN4 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",16,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 17 Annotations

out <- mygene::queryMany(gene_list[17,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query X_id name summary symbol notfound
ENSG00000137392 1208 colipase The protein encoded by this gene is a cofactor needed by pancreatic lipase for efficient dietary lipid hydrolysis. It binds to the C-terminal, non-catalytic domain of lipase, thereby stabilizing an active conformation and considerably increasing the overall hydrophobic binding site. The gene product allows lipase to anchor noncovalently to the surface of lipid micelles, counteracting the destabilizing influence of intestinal bile salts. This cofactor is only expressed in pancreatic acinar cells, suggesting regulation of expression by tissue-specific elements. Three transcript variants encoding different isoforms have been found for this gene. CLPS NA
ENSG00000185615 64714 protein disulfide isomerase family A member 2 Protein disulfide isomerases (EC 5.3.4.1), such as PDIP, are endoplasmic reticulum (ER) resident proteins that catalyze protein folding and thiol-disulfide interchange reactions (Desilva et al., 1996 [PubMed 8561901]). PDIA2 NA
ENSG00000172023 5968 regenerating family member 1 beta This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. REG1B NA
ENSG00000168928 440387 chymotrypsinogen B2 NA CTRB2 NA
ENSG00000187021 5407 pancreatic lipase related protein 1 NA PNLIPRP1 NA
ENSG00000219073 23436 chymotrypsin like elastase family member 3B Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. CELA3B NA
ENSG00000175535 5406 pancreatic lipase This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. PNLIP NA
ENSG00000250606 NA NA NA NA TRUE
ENSG00000142789 10136 chymotrypsin like elastase family member 3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. CELA3A NA
ENSG00000179751 342898 syncollin NA SYCN NA
ENSG00000168925 1504 chymotrypsinogen B1 The protein encoded by this gene is one of a family of serine proteases that is secreted into the gastrointestinal tract as an inactive precursor, which is activated by proteolytic cleavage with trypsin. CTRB1 NA
ENSG00000172016 5068 regenerating family member 3 alpha This gene encodes a pancreatic secretory protein that may be involved in cell proliferation or differentiation. It has similarity to the C-type lectin superfamily. The enhanced expression of this gene is observed during pancreatic inflammation and liver carcinogenesis. The mature protein also functions as an antimicrobial protein with antibacterial activity. Alternate splicing results in multiple transcript variants that encode the same protein. REG3A NA
ENSG00000076864 5909 RAP1 GTPase activating protein This gene encodes a type of GTPase-activating-protein (GAP) that down-regulates the activity of the ras-related RAP1 protein. RAP1 acts as a molecular switch by cycling between an inactive GDP-bound form and an active GTP-bound form. The product of this gene, RAP1GAP, promotes the hydrolysis of bound GTP and hence returns RAP1 to the inactive state whereas other proteins, guanine nucleotide exchange factors (GEFs), act as RAP1 activators by facilitating the conversion of RAP1 from the GDP- to the GTP-bound form. In general, ras subfamily proteins, such as RAP1, play key roles in receptor-linked signaling pathways that control cell growth and differentiation. RAP1 plays a role in diverse processes such as cell proliferation, adhesion, differentiation, and embryogenesis. Alternative splicing results in multiple transcript variants encoding distinct proteins. RAP1GAP NA
ENSG00000125414 4620 myosin, heavy chain 2, skeletal muscle, adult Myosins are actin-based motor proteins that function in the generation of mechanical force in eukaryotic cells. Muscle myosins are heterohexamers composed of 2 myosin heavy chains and 2 pairs of nonidentical myosin light chains. This gene encodes a member of the class II or conventional myosin heavy chains, and functions in skeletal muscle contraction. This gene is found in a cluster of myosin heavy chain genes on chromosome 17. A mutation in this gene results in inclusion body myopathy-3. Multiple alternatively spliced variants, encoding the same protein, have been identified. MYH2 NA
ENSG00000091704 1357 carboxypeptidase A1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. CPA1 NA
ENSG00000204983 5644 protease, serine 1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. PRSS1 NA
ENSG00000112210 51715 RAB23, member RAS oncogene family This gene encodes a small GTPase of the Ras superfamily. Rab proteins are involved in the regulation of diverse cellular functions associated with intracellular membrane trafficking, including autophagy and immune response to bacterial infection. The encoded protein may play a role in central nervous system development by antagonizing sonic hedgehog signaling. Disruption of this gene has been implicated in Carpenter syndrome as well as cancer. Alternative splicing results in multiple transcript variants. RAB23 NA
ENSG00000117013 9132 potassium voltage-gated channel subfamily Q member 4 The protein encoded by this gene forms a potassium channel that is thought to play a critical role in the regulation of neuronal excitability, particularly in sensory cells of the cochlea. The current generated by this channel is inhibited by M1 muscarinic acetylcholine receptors and activated by retigabine, a novel anti-convulsant drug. The encoded protein can form a homomultimeric potassium channel or possibly a heteromultimeric channel in association with the protein encoded by the KCNQ3 gene. Defects in this gene are a cause of nonsyndromic sensorineural deafness type 2 (DFNA2), an autosomal dominant form of progressive hearing loss. Two transcript variants encoding different isoforms have been found for this gene. KCNQ4 NA
ENSG00000134871 1284 collagen type IV alpha 2 This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. The C-terminal portion of the protein, known as canstatin, is an inhibitor of angiogenesis and tumor growth. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. COL4A2 NA
ENSG00000142615 63036 chymotrypsin like elastase family member 2A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Like most of the human elastases, elastase 2A is secreted from the pancreas as a zymogen. In other species, elastase 2A has been shown to preferentially cleave proteins after leucine, methionine, and phenylalanine residues. CELA2A NA
ENSG00000172403 171024 synaptopodin 2 NA SYNPO2 NA
ENSG00000238133 339751 MLK7 antisense RNA 1 NA MLK7-AS1 NA
ENSG00000169347 2813 glycoprotein 2 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. GP2 NA
ENSG00000164949 2669 GTP binding protein overexpressed in skeletal muscle The protein encoded by this gene belongs to the RAD/GEM family of GTP-binding proteins. It is associated with the inner face of the plasma membrane and could play a role as a regulatory protein in receptor-mediated signal transduction. Alternative splicing occurs at this locus and two transcript variants encoding the same protein have been identified. GEM NA
ENSG00000167600 29785 cytochrome P450 family 2 subfamily S member 1 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. In rodents, the homologous protein has been shown to metabolize certain carcinogens; however, the specific function of the human protein has not been determined. CYP2S1 NA
ENSG00000115590 7850 interleukin 1 receptor type 2 The protein encoded by this gene is a cytokine receptor that belongs to the interleukin 1 receptor family. This protein binds interleukin alpha (IL1A), interleukin beta (IL1B), and interleukin 1 receptor, type I(IL1R1/IL1RA), and acts as a decoy receptor that inhibits the activity of its ligands. Interleukin 4 (IL4) is reported to antagonize the activity of interleukin 1 by inducing the expression and release of this cytokine. This gene and three other genes form a cytokine receptor gene cluster on chromosome 2q12. Alternative splicing results in multiple transcript variants and protein isoforms. Alternative splicing produces both membrane-bound and soluble proteins. A soluble protein is also produced by proteolytic cleavage. IL1R2 NA
ENSG00000129521 112399 egl-9 family hypoxia inducible factor 3 NA EGLN3 NA
ENSG00000124145 6385 syndecan 4 The protein encoded by this gene is a transmembrane (type I) heparan sulfate proteoglycan that functions as a receptor in intracellular signaling. The encoded protein is found as a homodimer and is a member of the syndecan proteoglycan family. This gene is found on chromosome 20, while a pseudogene has been found on chromosome 22. SDC4 NA
ENSG00000174171 105370792 uncharacterized LOC105370792 NA LOC105370792 NA
ENSG00000170890 5319 phospholipase A2 group IB This gene encodes a secreted member of the phospholipase A2 (PLA2) class of enzymes, which is produced by the pancreatic acinar cells. The encoded calcium-dependent enzyme catalyzes the hydrolysis of the sn-2 position of membrane glycerophospholipids to release arachidonic acid (AA) and lysophospholipids. AA is subsequently converted by downstream metabolic enzymes to several bioactive lipophilic compounds (eicosanoids), including prostaglandins (PGs) and leukotrienes (LTs). The enzyme may be involved in several physiological processes including cell contraction, cell proliferation and pathological response. PLA2G1B NA
ENSG00000067191 782 calcium voltage-gated channel auxiliary subunit beta 1 The protein encoded by this gene belongs to the calcium channel beta subunit family. It plays an important role in the calcium channel by modulating G protein inhibition, increasing peak calcium current, controlling the alpha-1 subunit membrane targeting and shifting the voltage dependence of activation and inactivation. Alternative splicing occurs at this locus and three transcript variants encoding three distinct isoforms have been identified. CACNB1 NA
ENSG00000161281 1346 cytochrome c oxidase subunit 7A1 Cytochrome c oxidase (COX), the terminal component of the mitochondrial respiratory chain, catalyzes the electron transfer from reduced cytochrome c to oxygen. This component is a heteromeric complex consisting of 3 catalytic subunits encoded by mitochondrial genes and multiple structural subunits encoded by nuclear genes. The mitochondrially-encoded subunits function in electron transfer, and the nuclear-encoded subunits may function in the regulation and assembly of the complex. This nuclear gene encodes polypeptide 1 (muscle isoform) of subunit VIIa and the polypeptide 1 is present only in muscle tissues. Other polypeptides of subunit VIIa are present in both muscle and nonmuscle tissues, and are encoded by different genes. COX7A1 NA
ENSG00000174136 285704 repulsive guidance molecule family member b RGMB is a glycosylphosphatidylinositol (GPI)-anchored member of the repulsive guidance molecule family (see RGMA, MIM 607362) and contributes to the patterning of the developing nervous system (Samad et al., 2005 [PubMed 15671031]). RGMB NA
ENSG00000169442 1043 CD52 molecule NA CD52 NA
ENSG00000110880 23603 coronin 1C This gene encodes a member of the WD repeat protein family. WD repeats are minimally conserved regions of approximately 40 amino acids typically bracketed by gly-his and trp-asp (GH-WD), which may facilitate formation of heterotrimeric or multiprotein complexes. Members of this family are involved in a variety of cellular processes, including cell cycle progression, signal transduction, apoptosis, and gene regulation. Three transcript variants encoding two different isoforms have been found for this gene. CORO1C NA
ENSG00000214290 120376 colorectal cancer associated 2 NA COLCA2 NA
ENSG00000137094 25822 DnaJ heat shock protein family (Hsp40) member B5 DNAJB5 belongs to the evolutionarily conserved DNAJ/HSP40 protein family. For background information on the DNAJ family, see MIM 608375. DNAJB5 NA
ENSG00000140511 145864 hyaluronan and proteoglycan link protein 3 This gene belongs to the hyaluronan and proteoglycan binding link protein gene family. The protein encoded by this gene may function in hyaluronic acid binding and cell adhesion. HAPLN3 NA
ENSG00000158270 81035 collectin subfamily member 12 This gene encodes a member of the C-lectin family, proteins that possess collagen-like sequences and carbohydrate recognition domains. This protein is a scavenger receptor, a cell surface glycoprotein that displays several functions associated with host defense. It can bind to carbohydrate antigens on microorganisms, facilitating their recognition and removal. It also mediates the recognition, internalization, and degradation of oxidatively modified low density lipoprotein by vascular endothelial cells. COLEC12 NA
ENSG00000141086 1506 chymotrypsin like NA CTRL NA
ENSG00000142156 1291 collagen type VI alpha 1 The collagens are a superfamily of proteins that play a role in maintaining the integrity of various tissues. Collagens are extracellular matrix proteins and have a triple-helical domain as their common structural element. Collagen VI is a major structural component of microfibrils. The basic structural unit of collagen VI is a heterotrimer of the alpha1(VI), alpha2(VI), and alpha3(VI) chains. The alpha2(VI) and alpha3(VI) chains are encoded by the COL6A2 and COL6A3 genes, respectively. The protein encoded by this gene is the alpha 1 subunit of type VI collagen (alpha1(VI) chain). Mutations in the genes that code for the collagen VI subunits result in the autosomal dominant disorder, Bethlem myopathy. COL6A1 NA
ENSG00000185532 5592 protein kinase, cGMP-dependent, type I Mammals have three different isoforms of cyclic GMP-dependent protein kinase (Ialpha, Ibeta, and II). These PRKG isoforms act as key mediators of the nitric oxide/cGMP signaling pathway and are important components of many signal transduction processes in diverse cell types. This PRKG1 gene on human chromosome 10 encodes the soluble Ialpha and Ibeta isoforms of PRKG by alternative transcript splicing. A separate gene on human chromosome 4, PRKG2, encodes the membrane-bound PRKG isoform II. The PRKG1 proteins play a central role in regulating cardiovascular and neuronal functions in addition to relaxing smooth muscle tone, preventing platelet aggregation, and modulating cell growth. This gene is most strongly expressed in all types of smooth muscle, platelets, cerebellar Purkinje cells, hippocampal neurons, and the lateral amygdala. Isoforms Ialpha and Ibeta have identical cGMP-binding and catalytic domains but differ in their leucine/isoleucine zipper and autoinhibitory sequences and therefore differ in their dimerization substrates and kinase enzyme activity. PRKG1 NA
ENSG00000070190 27071 dual adaptor of phosphotyrosine and 3-phosphoinositides 1 NA DAPP1 NA
ENSG00000023902 51177 pleckstrin homology domain containing O1 NA PLEKHO1 NA
ENSG00000244945 101928445 uncharacterized LOC101928445 NA LOC101928445 NA
ENSG00000224597 102724316 SVIL antisense RNA 1 NA SVIL-AS1 NA
ENSG00000245864 ENSG00000245864 NA NA CTC-467M3.1 NA
ENSG00000135346 1081 glycoprotein hormones, alpha polypeptide The four human glycoprotein hormones chorionic gonadotropin (CG), luteinizing hormone (LH), follicle stimulating hormone (FSH), and thyroid stimulating hormone (TSH) are dimers consisting of alpha and beta subunits that are associated noncovalently. The alpha subunits of these hormones are identical, however, their beta chains are unique and confer biological specificity. The protein encoded by this gene is the alpha subunit and belongs to the glycoprotein hormones alpha chain family. Two transcript variants encoding different isoforms have been found for this gene. CGA NA
ENSG00000118496 84085 F-box protein 30 This gene encodes a member of the F-box protein family which is characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of the ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into 3 classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbxs class and it is upregulated in nasopharyngeal carcinoma. FBXO30 NA
ENSG00000158516 1358 carboxypeptidase A2 Three different forms of human pancreatic procarboxypeptidase A have been isolated. The encoded protein represents the A2 form, which is a monomeric protein with different biochemical properties from the A1 and A3 forms. The A2 form of pancreatic procarboxypeptidase acts on aromatic C-terminal residues and is a secreted protein. CPA2 NA
ENSG00000213639 5500 protein phosphatase 1 catalytic subunit beta The protein encoded by this gene is one of the three catalytic subunits of protein phosphatase 1 (PP1). PP1 is a serine/threonine specific protein phosphatase known to be involved in the regulation of a variety of cellular processes, such as cell division, glycogen metabolism, muscle contractility, protein synthesis, and HIV-1 viral transcription. Mouse studies suggest that PP1 functions as a suppressor of learning and memory. Two alternatively spliced transcript variants encoding distinct isoforms have been observed. PPP1CB NA
ENSG00000148516 6935 zinc finger E-box binding homeobox 1 This gene encodes a zinc finger transcription factor. The encoded protein likely plays a role in transcriptional repression of interleukin 2. Mutations in this gene have been associated with posterior polymorphous corneal dystrophy-3 and late-onset Fuchs endothelial corneal dystrophy. Alternatively spliced transcript variants encoding different isoforms have been described. ZEB1 NA
ENSG00000153002 1360 carboxypeptidase B1 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. CPB1 NA
ENSG00000112208 9532 BCL2 associated athanogene 2 BAG proteins compete with Hip for binding to the Hsc70/Hsp70 ATPase domain and promote substrate release. All the BAG proteins have an approximately 45-amino acid BAG domain near the C terminus but differ markedly in their N-terminal regions. The predicted BAG2 protein contains 211 amino acids. The BAG domains of BAG1, BAG2, and BAG3 interact specifically with the Hsc70 ATPase domain in vitro and in mammalian cells. All 3 proteins bind with high affinity to the ATPase domain of Hsc70 and inhibit its chaperone activity in a Hip-repressible manner. BAG2 NA
ENSG00000100342 8542 apolipoprotein L1 This gene encodes a secreted high density lipoprotein which binds to apolipoprotein A-I. Apolipoprotein A-I is a relatively abundant plasma protein and is the major apoprotein of HDL. It is involved in the formation of most cholesteryl esters in plasma and also promotes efflux of cholesterol from cells. This apolipoprotein L family member may play a role in lipid exchange and transport throughout the body, as well as in reverse cholesterol transport from peripheral cells to the liver. Several different transcript variants encoding different isoforms have been found for this gene. APOL1 NA
ENSG00000086015 23139 microtubule associated serine/threonine kinase 2 NA MAST2 NA
ENSG00000175899 2 alpha-2-macroglobulin Alpha-2-macroglobulin is a protease inhibitor and cytokine transporter. It inhibits many proteases, including trypsin, thrombin and collagenase. A2M is implicated in Alzheimer disease (AD) due to its ability to mediate the clearance and degradation of A-beta, the major component of beta-amyloid deposits. A2M NA
ENSG00000152268 NA NA NA NA TRUE
ENSG00000213144 ENSG00000213144 NA NA RP11-64B16.2 NA
ENSG00000124831 9208 leucine rich repeat (in FLII) interacting protein 1 NA LRRFIP1 NA
ENSG00000121057 8165 A-kinase anchoring protein 1 The A-kinase anchor proteins (AKAPs) are a group of structurally diverse proteins, which have the common function of binding to the regulatory subunit of protein kinase A (PKA) and confining the holoenzyme to discrete locations within the cell. This gene encodes a member of the AKAP family. The encoded protein binds to type I and type II regulatory subunits of PKA and anchors them to the mitochondrion. This protein is speculated to be involved in the cAMP-dependent signal transduction pathway and in directing RNA to a specific cellular compartment. AKAP1 NA
ENSG00000173641 27129 heat shock protein family B (small) member 7 NA HSPB7 NA
ENSG00000164078 4486 macrophage stimulating 1 receptor This gene encodes a cell surface receptor for macrophage-stimulating protein (MSP) with tyrosine kinase activity. The mature form of this protein is a heterodimer of disulfide-linked alpha and beta subunits, generated by proteolytic cleavage of a single-chain precursor. The beta subunit undergoes tyrosine phosphorylation upon stimulation by MSP. This protein is expressed on the ciliated epithelia of the mucociliary transport apparatus of the lung, and together with MSP, thought to be involved in host defense. Alternative splicing generates multiple transcript variants encoding different isoforms that may undergo similar proteolytic processing. MST1R NA
ENSG00000187498 1282 collagen type IV alpha 1 chain This gene encodes a type IV collagen alpha protein. Type IV collagen proteins are integral components of basement membranes. This gene shares a bidirectional promoter with a paralogous gene on the opposite strand. The protein consists of an amino-terminal 7S domain, a triple-helix forming collagenous domain, and a carboxy-terminal non-collagenous domain. It functions as part of a heterotrimer and interacts with other extracellular matrix components such as perlecans, proteoglycans, and laminins. In addition, proteolytic cleavage of the non-collagenous carboxy-terminal domain results in a biologically active fragment known as arresten, which has anti-angiogenic and tumor suppressor properties. Mutations in this gene cause porencephaly, cerebrovascular disease, and renal and muscular defects. Alternative splicing results in multiple transcript variants. COL4A1 NA
ENSG00000107438 9124 PDZ and LIM domain 1 This gene encodes a member of the enigma protein family. The protein contains two protein interacting domains, a PDZ domain at the amino terminal end and one to three LIM domains at the carboxyl terminal. It is a cytoplasmic protein associated with the cytoskeleton. The protein may function as an adapter to bring other LIM-interacting proteins to the cytoskeleton. Pseudogenes associated with this gene are located on chromosomes 3, 14 and 17. PDLIM1 NA
ENSG00000148498 56288 par-3 family cell polarity regulator This gene encodes a member of the PARD protein family. PARD family members interact with other PARD family members and other proteins; they affect asymmetrical cell division and direct polarized cell growth. Multiple alternatively spliced transcript variants have been described for this gene. PARD3 NA
ENSG00000119938 5507 protein phosphatase 1 regulatory subunit 3C This gene encodes a regulatory subunit of protein phosphatase-1 (PP1). PP1 catalyzes reversible protein phosphorylation, which is important in a wide range of cellular activities: neuronal, muscular, RNA splicing, protein synthesis, cell death, and glycogen metabolism, to name just a few. By interacting with different regulatory subunits, PP1 is directed to different parts of the cell, to different substrates, or to respond to extracellular signals. PPP1R3C NA
ENSG00000157404 3815 KIT proto-oncogene receptor tyrosine kinase This gene encodes the human homolog of the proto-oncogene c-kit. C-kit was first identified as the cellular homolog of the feline sarcoma viral oncogene v-kit. This protein is a type 3 transmembrane receptor for MGF (mast cell growth factor, also known as stem cell factor). Mutations in this gene are associated with gastrointestinal stromal tumors, mast cell disease, acute myelogenous lukemia, and piebaldism. Multiple transcript variants encoding different isoforms have been found for this gene. KIT NA
ENSG00000119686 55640 feline leukemia virus subgroup C cellular receptor family member 2 This gene encodes a member of the major facilitator superfamily. The encoded transmembrane protein is a calcium transporter. Unlike the related protein feline leukemia virus subgroup C receptor 1, the protein encoded by this locus does not bind to feline leukemia virus subgroup C envelope protein. The encoded protein may play a role in development of brain vascular endothelial cells, as mutations at this locus have been associated with proliferative vasculopathy and hydranencephaly-hydrocephaly syndrome. Alternatively spliced transcript variants have been described. FLVCR2 NA
ENSG00000154096 7070 Thy-1 cell surface antigen This gene encodes a cell surface glycoprotein and member of the immunoglobulin superfamily of proteins. The encoded protein is involved in cell adhesion and cell communication in numerous cell types, but particularly in cells of the immune and nervous systems. The encoded protein is widely used as a marker for hematopoietic stem cells. This gene may function as a tumor suppressor in nasopharyngeal carcinoma. Alternative splicing results in multiple transcript variants. THY1 NA
ENSG00000268364 ENSG00000268364 SMC5 antisense RNA 1 (head to head) NA SMC5-AS1 NA
ENSG00000160200 875 cystathionine-beta-synthase The protein encoded by this gene acts as a homotetramer to catalyze the conversion of homocysteine to cystathionine, the first step in the transsulfuration pathway. The encoded protein is allosterically activated by adenosyl-methionine and uses pyridoxal phosphate as a cofactor. Defects in this gene can cause cystathionine beta-synthase deficiency (CBSD), which can lead to homocystinuria. This gene is a major contributor to cellular hydrogen sulfide production. Multiple alternatively spliced transcript variants have been found for this gene. CBS NA
ENSG00000110799 7450 von Willebrand factor This gene encodes a glycoprotein involved in hemostasis. The encoded preproprotein is proteolytically processed following assembly into large multimeric complexes. These complexes function in the adhesion of platelets to sites of vascular injury and the transport of various proteins in the blood. Mutations in this gene result in von Willebrand disease, an inherited bleeding disorder. An unprocessed pseudogene has been found on chromosome 22. VWF NA
ENSG00000109061 4619 myosin, heavy chain 1, skeletal muscle, adult Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development. MYH1 NA
ENSG00000174306 23051 zinc fingers and homeoboxes 3 This gene encodes a member of the zinc fingers and homeoboxes (ZHX) gene family. The encoded protein contains two C2H2-type zinc fingers and five homeodomains and forms a dimer with itself or with zinc fingers and homeoboxes family member 1. In the nucleus, the dimerized protein interacts with the A subunit of the ubiquitous transcription factor nuclear factor-Y and may function as a transcriptional repressor. ZHX3 NA
ENSG00000127824 7277 tubulin alpha 4a Microtubules of the eukaryotic cytoskeleton perform essential and diverse functions and are composed of a heterodimer of alpha and beta tubulin. The genes encoding these microtubule constituents are part of the tubulin superfamily, which is composed of six distinct families. Genes from the alpha, beta and gamma tubulin families are found in all eukaryotes. The alpha and beta tubulins represent the major components of microtubules, while gamma tubulin plays a critical role in the nucleation of microtubule assembly. There are multiple alpha and beta tubulin genes and they are highly conserved among and between species. This gene encodes an alpha tubulin that is a highly conserved homolog of a rat testis-specific alpha tubulin. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. TUBA4A NA
ENSG00000140416 7168 tropomyosin 1 (alpha) This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. TPM1 NA
ENSG00000167617 148170 CDC42 effector protein 5 Cell division control protein 42 (CDC42), a small Rho GTPase, regulates the formation of F-actin-containing structures through its interaction with the downstream effector proteins. The protein encoded by this gene is a member of the Borg (binder of Rho GTPases) family of CDC42 effector proteins. Borg family proteins contain a CRIB (Cdc42/Rac interactive-binding) domain. They bind to CDC42 and regulate its function negatively. The encoded protein may inhibit c-Jun N-terminal kinase (JNK) independently of CDC42 binding. The protein may also play a role in septin organization and inducing pseudopodia formation in fibroblasts CDC42EP5 NA
ENSG00000111371 81539 solute carrier family 38 member 1 Amino acid transporters play essential roles in the uptake of nutrients, production of energy, chemical metabolism, detoxification, and neurotransmitter cycling. SLC38A1 is an important transporter of glutamine, an intermediate in the detoxification of ammonia and the production of urea. Glutamine serves as a precursor for the synaptic transmitter, glutamate (Gu et al., 2001 [PubMed 11325958]). SLC38A1 NA
ENSG00000234175 ENSG00000234175 NA NA RP11-730A19.9 NA
ENSG00000184113 7122 claudin 5 This gene encodes a member of the claudin family. Claudins are integral membrane proteins and components of tight junction strands. Tight junction strands serve as a physical barrier to prevent solutes and water from passing freely through the paracellular space between epithelial or endothelial cell sheets. Mutations in this gene have been found in patients with velocardiofacial syndrome. Alternatively spliced transcript variants encoding the same protein have been found for this gene. CLDN5 NA
ENSG00000267328 ENSG00000267328 NA NA AC002398.12 NA
ENSG00000168484 6440 surfactant protein C This gene encodes the pulmonary-associated surfactant protein C (SPC), an extremely hydrophobic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 2, also called pulmonary alveolar proteinosis due to surfactant protein C deficiency, and are associated with interstitial lung disease in older infants, children, and adults. Alternatively spliced transcript variants encoding different protein isoforms have been identified. SFTPC NA
ENSG00000107130 23413 neuronal calcium sensor 1 This gene is a member of the neuronal calcium sensor gene family, which encode calcium-binding proteins expressed predominantly in neurons. The protein encoded by this gene regulates G protein-coupled receptor phosphorylation in a calcium-dependent manner and can substitute for calmodulin. The protein is associated with secretory granules and modulates synaptic transmission and synaptic plasticity. Multiple transcript variants encoding different isoforms have been found for this gene. NCS1 NA
ENSG00000257410 ENSG00000257410 NA NA RP11-2H8.2 NA
ENSG00000244274 55861 dysbindin domain containing 2 NA DBNDD2 NA
ENSG00000198668 801 calmodulin 1 (phosphorylase kinase, delta) This gene encodes a member of the EF-hand calcium-binding protein family. It is one of three genes which encode an identical calcium binding protein which is one of the four subunits of phosphorylase kinase. Two pseudogenes have been identified on chromosome 7 and X. Multiple transcript variants encoding different isoforms have been found for this gene. CALM1 NA
ENSG00000198668 805 calmodulin 2 (phosphorylase kinase, delta) This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. CALM2 NA
ENSG00000100784 9252 ribosomal protein S6 kinase A5 NA RPS6KA5 NA
ENSG00000266101 ENSG00000266101 NA NA RP5-906A24.2 NA
ENSG00000213165 NA NA NA NA TRUE
ENSG00000163702 84818 interleukin 17 receptor C This gene encodes a single-pass type I membrane protein that shares similarity with the interleukin-17 receptor (IL-17RA). Unlike IL-17RA, which is predominantly expressed in hemopoietic cells, and binds with high affinity to only IL-17A, this protein is expressed in nonhemopoietic tissues, and binds both IL-17A and IL-17F with similar affinities. The proinflammatory cytokines, IL-17A and IL-17F, have been implicated in the progression of inflammatory and autoimmune diseases. Multiple alternatively spliced transcript variants encoding different isoforms have been detected for this gene, and it has been proposed that soluble, secreted proteins lacking transmembrane and intracellular domains may function as extracellular antagonists to cytokine signaling. IL17RC NA
ENSG00000097007 25 ABL proto-oncogene 1, non-receptor tyrosine kinase This gene is a protooncogene that encodes a protein tyrosine kinase involved in a variety of cellular processes, including cell division, adhesion, differentiation, and response to stress. The activity of the protein is negatively regulated by its SH3 domain, whereby deletion of the region encoding this domain results in an oncogene. The ubiquitously expressed protein has DNA-binding activity that is regulated by CDC2-mediated phosphorylation, suggesting a cell cycle function. This gene has been found fused to a variety of translocation partner genes in various leukemias, most notably the t(9;22) translocation that results in a fusion with the 5’ end of the breakpoint cluster region gene (BCR; MIM:151410). Alternative splicing of this gene results in two transcript variants, which contain alternative first exons that are spliced to the remaining common exons. ABL1 NA
ENSG00000183092 57596 brain enriched guanylate kinase associated NA BEGAIN NA
ENSG00000111907 7164 tumor protein D52-like 1 This gene encodes a member of a family of proteins that contain coiled-coil domains and may form hetero- or homomers. The encoded protein is involved in cell proliferation and calcium signaling. It also interacts with the mitogen-activated protein kinase kinase kinase 5 (MAP3K5/ASK1) and positively regulates MAP3K5-induced apoptosis. Multiple alternatively spliced transcript variants have been observed. TPD52L1 NA
ENSG00000118785 6696 secreted phosphoprotein 1 The protein encoded by this gene is involved in the attachment of osteoclasts to the mineralized bone matrix. The encoded protein is secreted and binds hydroxyapatite with high affinity. The osteoclast vitronectin receptor is found in the cell membrane and may be involved in the binding to this protein. This protein is also a cytokine that upregulates expression of interferon-gamma and interleukin-12. Several transcript variants encoding different isoforms have been found for this gene. SPP1 NA
ENSG00000183044 18 4-aminobutyrate aminotransferase 4-aminobutyrate aminotransferase (ABAT) is responsible for catabolism of gamma-aminobutyric acid (GABA), an important, mostly inhibitory neurotransmitter in the central nervous system, into succinic semialdehyde. The active enzyme is a homodimer of 50-kD subunits complexed to pyridoxal-5-phosphate. The protein sequence is over 95% similar to the pig protein. GABA is estimated to be present in nearly one-third of human synapses. ABAT in liver and brain is controlled by 2 codominant alleles with a frequency in a Caucasian population of 0.56 and 0.44. The ABAT deficiency phenotype includes psychomotor retardation, hypotonia, hyperreflexia, lethargy, refractory seizures, and EEG abnormalities. Multiple alternatively spliced transcript variants encoding the same protein isoform have been found for this gene. ABAT NA
ENSG00000170667 100271927 RAS p21 protein activator 4B NA RASA4B NA
ENSG00000255112 57132 charged multivesicular body protein 1B CHMP1B belongs to the chromatin-modifying protein/charged multivesicular body protein (CHMP) family. These proteins are components of ESCRT-III (endosomal sorting complex required for transport III), a complex involved in degradation of surface receptor proteins and formation of endocytic multivesicular bodies (MVBs). Some CHMPs have both nuclear and cytoplasmic/vesicular distributions, and one such CHMP, CHMP1A (MIM 164010), is required for both MVB formation and regulation of cell cycle progression (Tsang et al., 2006 [PubMed 16730941]). CHMP1B NA
ENSG00000004776 126393 heat shock protein family B (small) member 6 This locus encodes a heat shock protein. The encoded protein likely plays a role in smooth muscle relaxation. HSPB6 NA
ENSG00000131242 84440 RAB11 family interacting protein 4 Proteins of the large Rab GTPase family (see RAB1A; MIM 179508) have regulatory roles in the formation, targeting, and fusion of intracellular transport vesicles. RAB11FIP4 is one of many proteins that interact with and regulate Rab GTPases (Hales et al., 2001 [PubMed 11495908]). RAB11FIP4 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",17,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 18 Annotations

out <- mygene::queryMany(gene_list[18,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
symbol X_id query name summary
PRSS27 83886 ENSG00000172382 protease, serine 27 This gene is located within a large protease gene cluster on chromosome 16. It belongs to the group-1 subfamily of serine proteases. The encoded protein is a secreted tryptic serine protease and is expressed mainly in the pancreas. Alternative splicing results in multiple transcript variants.
MAL 4118 ENSG00000172005 mal, T-cell differentiation protein The protein encoded by this gene is a highly hydrophobic integral membrane protein belonging to the MAL family of proteolipids. The protein has been localized to the endoplasmic reticulum of T-cells and is a candidate linker protein in T-cell signal transduction. In addition, this proteolipid is localized in compact myelin of cells in the nervous system and has been implicated in myelin biogenesis and/or function. The protein plays a role in the formation, stabilization and maintenance of glycosphingolipid-enriched membrane microdomains. Down-regulation of this gene has been associated with a variety of human epithelial malignancies. Alternative splicing produces four transcript variants which vary from each other by the presence or absence of alternatively spliced exons 2 and 3.
TRIM29 23650 ENSG00000137699 tripartite motif containing 29 The protein encoded by this gene belongs to the TRIM protein family. It has multiple zinc finger motifs and a leucine zipper motif. It has been proposed to form homo- or heterodimers which are involved in nucleic acid binding. Thus, it may act as a transcriptional regulatory factor involved in carcinogenesis and/or differentiation. It may also function in the suppression of radiosensitivity since it is associated with ataxia telangiectasia phenotype.
TGM1 7051 ENSG00000092295 transglutaminase 1 The protein encoded by this gene is a membrane protein that catalyzes the addition of an alkyl group from an akylamine to a glutamine residue of a protein, forming an alkylglutamine in the protein. This protein alkylation leads to crosslinking of proteins and catenation of polyamines to proteins. This gene contains either one or two copies of a 22 nt repeat unit in its 3’ UTR. Mutations in this gene have been associated with autosomal recessive lamellar ichthyosis (LI) and nonbullous congenital ichthyosiform erythroderma (NCIE).
S100A2 6273 ENSG00000196754 S100 calcium binding protein A2 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may have a tumor suppressor function. Chromosomal rearrangements and altered expression of this gene have been implicated in breast cancer.
GJB2 2706 ENSG00000165474 gap junction protein beta 2 This gene encodes a member of the gap junction protein family. The gap junctions were first characterized by electron microscopy as regionally specialized structures on plasma membranes of contacting adherent cells. These structures were shown to consist of cell-to-cell channels that facilitate the transfer of ions and small molecules between cells. The gap junction proteins, also known as connexins, purified from fractions of enriched gap junctions from different tissues differ. According to sequence similarities at the nucleotide and amino acid levels, the gap junction proteins are divided into two categories, alpha and beta. Mutations in this gene are responsible for as much as 50% of pre-lingual, recessive deafness.
PTK6 5753 ENSG00000101213 protein tyrosine kinase 6 The protein encoded by this gene is a cytoplasmic nonreceptor protein kinase which may function as an intracellular signal transducer in epithelial tissues. Overexpression of this gene in mammary epithelial cells leads to sensitization of the cells to epidermal growth factor and results in a partially transformed phenotype. Expression of this gene has been detected at low levels in some breast tumors but not in normal breast tissue. The encoded protein has been shown to undergo autophosphorylation. Alternative splicing results in multiple transcript variants.
SPINK5 11005 ENSG00000133710 serine peptidase inhibitor, Kazal type 5 This gene encodes a multidomain serine protease inhibitor that contains 15 potential inhibitory domains. The encoded preproprotein is proteolytically processed to generate multiple protein products, which may exhibit unique activities and specificities. These proteins may play a role in skin and hair morphogenesis, as well as anti-inflammatory and antimicrobial protection of mucous epithelia. Mutations in this gene may result in Netherton syndrome, a disorder characterized by ichthyosis, defective cornification, and atopy. This gene is present in a gene cluster on chromosome 5. Alternative splicing results in multiple transcript variants.
VSIG10L 147645 ENSG00000186806 V-set and immunoglobulin domain containing 10 like NA
LYPD3 27076 ENSG00000124466 LY6/PLAUR domain containing 3 NA
TTC9 23508 ENSG00000133985 tetratricopeptide repeat domain 9 This gene encodes a protein that contains three tetratricopeptide repeats. The gene has been shown to be hormonally regulated in breast cancer cells and may play a role in cancer cell invasion and metastasis.
CRABP2 1382 ENSG00000143320 cellular retinoic acid binding protein 2 This gene encodes a member of the retinoic acid (RA, a form of vitamin A) binding protein family and lipocalin/cytosolic fatty-acid binding protein family. The protein is a cytosol-to-nuclear shuttling protein, which facilitates RA binding to its cognate receptor complex and transfer to the nucleus. It is involved in the retinoid signaling pathway, and is associated with increased circulating low-density lipoprotein cholesterol. Alternatively spliced transcript variants encoding the same protein have been found for this gene.
CTC-251D13.1 ENSG00000271795 ENSG00000271795 NA NA
CYSRT1 375791 ENSG00000197191 cysteine rich tail 1 NA
DEGS2 123099 ENSG00000168350 delta(4)-desaturase, sphingolipid 2 This gene encodes a bifunctional enzyme that is involved in the biosynthesis of phytosphingolipids in human skin and in other phytosphingolipid-containing tissues. This enzyme can act as a sphingolipid delta(4)-desaturase, and also as a sphingolipid C4-hydroxylase.
CSTB 1476 ENSG00000160213 cystatin B The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and kininogens. This gene encodes a stefin that functions as an intracellular thiol protease inhibitor. The protein is able to form a dimer stabilized by noncovalent forces, inhibiting papain and cathepsins l, h and b. The protein is thought to play a role in protecting against the proteases leaking from lysosomes. Evidence indicates that mutations in this gene are responsible for the primary defects in patients with progressive myoclonic epilepsy (EPM1).
KRT19 3880 ENSG00000171345 keratin 19 The protein encoded by this gene is a member of the keratin family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. The type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. Unlike its related family members, this smallest known acidic cytokeratin is not paired with a basic cytokeratin in epithelial cells. It is specifically expressed in the periderm, the transiently superficial layer that envelopes the developing epidermis. The type I cytokeratins are clustered in a region of chromosome 17q12-q21.
PSCA 8000 ENSG00000167653 prostate stem cell antigen This gene encodes a glycosylphosphatidylinositol-anchored cell membrane glycoprotein. In addition to being highly expressed in the prostate it is also expressed in the bladder, placenta, colon, kidney, and stomach. This gene is up-regulated in a large proportion of prostate cancers and is also detected in cancers of the bladder and pancreas. This gene includes a polymorphism that results in an upstream start codon in some individuals; this polymorphism is thought to be associated with a risk for certain gastric and bladder cancers. Alternative splicing results in multiple transcript variants.
TIAM1 7074 ENSG00000156299 T-cell lymphoma invasion and metastasis 1 NA
AIF1L 83543 ENSG00000126878 allograft inflammatory factor 1 like NA
ECM1 1893 ENSG00000143369 extracellular matrix protein 1 This gene encodes a soluble protein that is involved in endochondral bone formation, angiogenesis, and tumor biology. It also interacts with a variety of extracellular and structural proteins, contributing to the maintenance of skin integrity and homeostasis. Mutations in this gene are associated with lipoid proteinosis disorder (also known as hyalinosis cutis et mucosae or Urbach-Wiethe disease) that is characterized by generalized thickening of skin, mucosae and certain viscera. Alternatively spliced transcript variants encoding distinct isoforms have been described for this gene.
S100A8 6279 ENSG00000143546 S100 calcium binding protein A8 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and as a cytokine. Altered expression of this protein is associated with the disease cystic fibrosis. Multiple transcript variants encoding different isoforms have been found for this gene.
ALDH3A1 218 ENSG00000108602 aldehyde dehydrogenase 3 family member A1 Aldehyde dehydrogenases oxidize various aldehydes to the corresponding acids. They are involved in the detoxification of alcohol-derived acetaldehyde and in the metabolism of corticosteroids, biogenic amines, neurotransmitters, and lipid peroxidation. The enzyme encoded by this gene forms a cytoplasmic homodimer that preferentially oxidizes aromatic and medium-chain (6 carbons or more) saturated and unsaturated aldehyde substrates. It is thought to promote resistance to UV and 4-hydroxy-2-nonenal-induced oxidative damage in the cornea. The gene is located within the Smith-Magenis syndrome region on chromosome 17. Multiple alternatively spliced variants, encoding the same protein, have been identified.
S100A14 57402 ENSG00000189334 S100 calcium binding protein A14 This gene encodes a member of the S100 protein family which contains an EF-hand motif and binds calcium. The gene is located in a cluster of S100 genes on chromosome 1. Levels of the encoded protein have been found to be lower in cancerous tissue and associated with metastasis suggesting a tumor suppressor function (PMID: 19956863, 19351828).
CNFN 84518 ENSG00000105427 cornifelin NA
MREG 55686 ENSG00000118242 melanoregulin NA
RP11-20D14.6 ENSG00000249790 ENSG00000249790 NA NA
GEM 2669 ENSG00000164949 GTP binding protein overexpressed in skeletal muscle The protein encoded by this gene belongs to the RAD/GEM family of GTP-binding proteins. It is associated with the inner face of the plasma membrane and could play a role as a regulatory protein in receptor-mediated signal transduction. Alternative splicing occurs at this locus and two transcript variants encoding the same protein have been identified.
SAA1 6288 ENSG00000173432 serum amyloid A1 This gene encodes a member of the serum amyloid A family of apolipoproteins. The encoded preproprotein is proteolytically processed to generate the mature protein. This protein is a major acute phase protein that is highly expressed in response to inflammation and tissue injury. This protein also plays an important role in HDL metabolism and cholesterol homeostasis. High levels of this protein are associated with chronic inflammatory diseases including atherosclerosis, rheumatoid arthritis, Alzheimer’s disease and Crohn’s disease. This protein may also be a potential biomarker for certain tumors. Alternate splicing results in multiple transcript variants that encode the same protein. A pseudogene of this gene is found on chromosome 11.
DUOX1 53905 ENSG00000137857 dual oxidase 1 The protein encoded by this gene is a glycoprotein and a member of the NADPH oxidase family. The synthesis of thyroid hormone is catalyzed by a protein complex located at the apical membrane of thyroid follicular cells. This complex contains an iodide transporter, thyroperoxidase, and a peroxide generating system that includes proteins encoded by this gene and the similar DUOX2 gene. This protein is known as dual oxidase because it has both a peroxidase homology domain and a gp91phox domain. This protein generates hydrogen peroxide and thereby plays a role in the activity of thyroid peroxidase, lactoperoxidase, and in lactoperoxidase-mediated antimicrobial defense at mucosal surfaces. Two alternatively spliced transcript variants encoding the same protein have been described for this gene.
PPL 5493 ENSG00000118898 periplakin The protein encoded by this gene is a component of desmosomes and of the epidermal cornified envelope in keratinocytes. The N-terminal domain of this protein interacts with the plasma membrane and its C-terminus interacts with intermediate filaments. Through its rod domain, this protein forms complexes with envoplakin. This protein may serve as a link between the cornified envelope and desmosomes as well as intermediate filaments. AKT1/PKB, a protein kinase mediating a variety of cell growth and survival signaling processes, is reported to interact with this protein, suggesting a possible role for this protein as a localization signal in AKT1-mediated signaling.
GRHL1 29841 ENSG00000134317 grainyhead like transcription factor 1 This gene encodes a member of the grainyhead family of transcription factors. The encoded protein can exist as a homodimer or can form heterodimers with sister-of-mammalian grainyhead or brother-of-mammalian grainyhead. This protein functions as a transcription factor during development.
RP11-67L3.5 ENSG00000242396 ENSG00000242396 NA NA
IL20RB 53833 ENSG00000174564 interleukin 20 receptor subunit beta IL20RB and IL20RA (MIM 605620) form a heterodimeric receptor for interleukin-20 (IL20; MIM 605619) (Blumberg et al., 2001 [PubMed 11163236]).
SULT2B1 6820 ENSG00000088002 sulfotransferase family 2B member 1 Sulfotransferase enzymes catalyze the sulfate conjugation of many hormones, neurotransmitters, drugs, and xenobiotic compounds. These cytosolic enzymes are different in their tissue distributions and substrate specificities. The gene structure (number and length of exons) is similar among family members. This gene sulfates dehydroepiandrosterone but not 4-nitrophenol, a typical substrate for the phenol and estrogen sulfotransferase subfamilies. Two alternatively spliced variants that encode different isoforms have been described.
CSTA 1475 ENSG00000121552 cystatin A The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins, and kininogens. This gene encodes a stefin that functions as a cysteine protease inhibitor, forming tight complexes with papain and the cathepsins B, H, and L. The protein is one of the precursor proteins of cornified cell envelope in keratinocytes and plays a role in epidermal development and maintenance. Stefins have been proposed as prognostic and diagnostic tools for cancer.
PHACTR1 221692 ENSG00000112137 phosphatase and actin regulator 1 The protein encoded by this gene is a member of the phosphatase and actin regulator family of proteins. This family member can bind actin and regulate the reorganization of the actin cytoskeleton. It plays a role in tubule formation and in endothelial cell survival. Polymorphisms in this gene are associated with susceptibility to myocardial infarction, coronary artery disease and cervical artery dissection. Alternative splicing of this gene results in multiple transcript variants.
TINCR 257000 ENSG00000223573 tissue differentiation-inducing non-protein coding RNA This gene produces a spliced long non-coding RNA that is required for normal epidermal differentiation. This transcript regulates the expression of genes involved in the differentiation of epidermal tissue. Mutations in some of the genes targeted by this transcript have been implicated in epidermal skin diseases.
HBA2 3040 ENSG00000188536 hemoglobin subunit alpha 2 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported.
ANXA1 301 ENSG00000135046 annexin A1 This gene encodes a membrane-localized protein that binds phospholipids. This protein inhibits phospholipase A2 and has anti-inflammatory activity. Loss of function or expression of this gene has been detected in multiple tumors.
CREB3L1 90993 ENSG00000157613 cAMP responsive element binding protein 3 like 1 The protein encoded by this gene is normally found in the membrane of the endoplasmic reticulum (ER). However, upon stress to the ER, the encoded protein is cleaved and the released cytoplasmic transcription factor domain translocates to the nucleus. There it activates the transcription of target genes by binding to box-B elements.
CCDC151 115948 ENSG00000198003 coiled-coil domain containing 151 This gene encodes a protein containing coiled-coil domains. The encoded protein functions in outer dynein arm assembly and is required for motile cilia function. Mutations in this gene result in primary ciliary dyskinesia. Alternative splicing results in multiple transcript variants encoding different isoforms.
GNA15 2769 ENSG00000060558 G protein subunit alpha 15 NA
CORO2A 7464 ENSG00000106789 coronin 2A This gene encodes a member of the WD repeat protein family. WD repeats are minimally conserved regions of approximately 40 amino acids typically bracketed by gly-his and trp-asp (GH-WD), which may facilitate formation of heterotrimeric or multiprotein complexes. Members of this family are involved in a variety of cellular processes, including cell cycle progression, signal transduction, apoptosis, and gene regulation. This protein contains 5 WD repeats, and has a structural similarity with actin-binding proteins: the D. discoideum coronin and the human p57 protein, suggesting that this protein may also be an actin-binding protein that regulates cell motility. Alternative splicing of this gene generates 2 transcript variants.
BNIPL 149428 ENSG00000163141 BCL2/adenovirus E1B 19kD interacting protein like The protein encoded by this gene interacts with several other proteins, such as BCL2, ARHGAP1, MIF and GFER. It may function as a bridge molecule between BCL2 and ARHGAP1/CDC42 in promoting cell death. Alternatively spliced transcript variants encoding different isoforms have been described for this gene.
SLC16A9 220963 ENSG00000165449 solute carrier family 16 member 9 NA
FUT2 2524 ENSG00000176920 fucosyltransferase 2 The protein encoded by this gene is a Golgi stack membrane protein that is involved in the creation of a precursor of the H antigen, which is required for the final step in the soluble A and B antigen synthesis pathway. This gene is one of two encoding the galactoside 2-L-fucosyltransferase enzyme. Two transcript variants encoding the same protein have been found for this gene.
HBA1 3039 ENSG00000206172 hemoglobin subunit alpha 1 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported.
CRP 1401 ENSG00000132693 C-reactive protein, pentraxin-related The protein encoded by this gene belongs to the pentaxin family. It is involved in several host defense related functions based on its ability to recognize foreign pathogens and damaged cells of the host and to initiate their elimination by interacting with humoral and cellular effector systems in the blood. Consequently, the level of this protein in plasma increases greatly during acute phase response to tissue injury, infection, or other inflammatory stimuli.
LY6D 8581 ENSG00000167656 lymphocyte antigen 6 complex, locus D NA
S100A16 140576 ENSG00000188643 S100 calcium binding protein A16 NA
ATF7IP2 80063 ENSG00000166669 activating transcription factor 7 interacting protein 2 NA
HAS3 3038 ENSG00000103044 hyaluronan synthase 3 The protein encoded by this gene is involved in the synthesis of the unbranched glycosaminoglycan hyaluronan, or hyaluronic acid, which is a major constituent of the extracellular matrix. This gene is a member of the NODC/HAS gene family. Compared to the proteins encoded by other members of this gene family, this protein appears to be more of a regulator of hyaluronan synthesis. Alternative splicing results in multiple transcript variants.
AC004951.5 ENSG00000239556 ENSG00000239556 NA NA
ALDH1A3 220 ENSG00000184254 aldehyde dehydrogenase 1 family member A3 This gene encodes an aldehyde dehydrogenase enzyme that uses retinal as a substrate. Mutations in this gene have been associated with microphthalmia, isolated 8, and expression changes have also been detected in tumor cells. Alternative splicing results in multiple transcript variants.
ALOX12 239 ENSG00000108839 arachidonate 12-lipoxygenase, 12S type NA
ATG9B 285973 ENSG00000181652 autophagy related 9B This gene functions in the regulation of autophagy, a lysosomal degradation pathway. This gene also functions as an antisense transcript in the posttranscriptional regulation of the endothelial nitric oxide synthase 3 gene, which has 3’ overlap with this gene on the opposite strand. Mutations in this gene and disruption of the autophagy process have been associated with multiple cancers. Alternative splicing results in multiple transcript variants.
ANKRD65 441869 ENSG00000235098 ankyrin repeat domain 65 NA
GBP1P1 400759 ENSG00000225492 guanylate binding protein 1 pseudogene 1 NA
CDKN2B 1030 ENSG00000147883 cyclin-dependent kinase inhibitor 2B This gene lies adjacent to the tumor suppressor gene CDKN2A in a region that is frequently mutated and deleted in a wide variety of tumors. This gene encodes a cyclin-dependent kinase inhibitor, which forms a complex with CDK4 or CDK6, and prevents the activation of the CDK kinases, thus the encoded protein functions as a cell growth regulator that controls cell cycle G1 progression. The expression of this gene was found to be dramatically induced by TGF beta, which suggested its role in the TGF beta induced growth inhibition. Two alternatively spliced transcript variants of this gene, which encode distinct proteins, have been reported.
RP11-798K23.5 ENSG00000253520 ENSG00000253520 NA NA
SLC16A6 9120 ENSG00000108932 solute carrier family 16 member 6 NA
HPDL 84842 ENSG00000186603 4-hydroxyphenylpyruvate dioxygenase like NA
CCND2-AS1 103752584 ENSG00000256164 CCND2 antisense RNA 1 NA
RAPGEFL1 51195 ENSG00000108352 Rap guanine nucleotide exchange factor like 1 NA
CTD-2201G16.1 ENSG00000258444 ENSG00000258444 NA NA
TMEM79 84283 ENSG00000163472 transmembrane protein 79 NA
N4BP3 23138 ENSG00000145911 NEDD4 binding protein 3 NA
ARG2 384 ENSG00000081181 arginase 2 Arginase catalyzes the hydrolysis of arginine to ornithine and urea. At least two isoforms of mammalian arginase exists (types I and II) which differ in their tissue distribution, subcellular localization, immunologic crossreactivity and physiologic function. The type II isoform encoded by this gene, is located in the mitochondria and expressed in extra-hepatic tissues, especially kidney. The physiologic role of this isoform is poorly understood; it is thought to play a role in nitric oxide and polyamine metabolism. Transcript variants of the type II gene resulting from the use of alternative polyadenylation sites have been described.
MUC20P1 ENSG00000224769 ENSG00000224769 mucin 20, cell surface associated pseudogene 1 NA
MYH7 4625 ENSG00000092054 myosin, heavy chain 7, cardiac muscle, beta Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy.
CCDC114 93233 ENSG00000105479 coiled-coil domain containing 114 This gene encodes a coiled-coil domain-containing protein that is a component of the outer dynein arm docking complex in cilia cells. Mutations in this gene may cause primary ciliary dyskinesia 20.
RHOV 171177 ENSG00000104140 ras homolog family member V NA
SH3BGR 6450 ENSG00000185437 SH3 domain binding glutamate rich protein NA
RP11-316O14.1 ENSG00000268603 ENSG00000268603 NA NA
HBB 3043 ENSG00000244734 hemoglobin subunit beta The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’.
SLC9A3 6550 ENSG00000066230 solute carrier family 9 member A3 The protein encoded by this gene is an epithelial brush border Na/H exchanger that uses an inward sodium ion gradient to expel acids from the cell. Defects in this gene are a cause of congenital secretory sodium diarrhea. Pseudogenes of this gene exist on chromosomes 10 and 22.
FAM117A 81558 ENSG00000121104 family with sequence similarity 117 member A NA
SNRPN 6638 ENSG00000128739 small nuclear ribonucleoprotein polypeptide N The protein encoded by this gene is one polypeptide of a small nuclear ribonucleoprotein complex and belongs to the snRNP SMB/SMN family. The protein plays a role in pre-mRNA processing, possibly tissue-specific alternative splicing events. Although individual snRNPs are believed to recognize specific nucleic acid sequences through RNA-RNA base pairing, the specific role of this family member is unknown. The protein arises from a bicistronic transcript that also encodes a protein identified as the SNRPN upstream reading frame (SNURF). Multiple transcription initiation sites have been identified and extensive alternative splicing occurs in the 5’ untranslated region. Additional splice variants have been described but sequences for the complete transcripts have not been determined. The 5’ UTR of this gene has been identified as an imprinting center. Alternative splicing or deletion caused by a translocation event in this paternally-expressed region is responsible for Angelman syndrome or Prader-Willi syndrome due to parental imprint switch failure.
SLC45A4 57210 ENSG00000022567 solute carrier family 45 member 4 NA
DSC2 1824 ENSG00000134755 desmocollin 2 This gene encodes a member of the desmocollin protein subfamily. Desmocollins, along with desmogleins, are cadherin-like transmembrane glycoproteins that are major components of the desmosome. Desmosomes are cell-cell junctions that help resist shearing forces and are found in high concentrations in cells subject to mechanical stress. This gene is found in a cluster with other desmocollin family members on chromosome 18. Mutations in this gene are associated with arrhythmogenic right ventricular dysplasia-11, and reduced protein expression has been described in several types of cancer. Alternative splicing results in multiple transcript variants.
TTC25 83538 ENSG00000204815 tetratricopeptide repeat domain 25 NA
CHPF 79586 ENSG00000123989 chondroitin polymerizing factor NA
TNNI3 7137 ENSG00000129991 troponin I3, cardiac type Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. This gene encodes the TnI-cardiac protein and is exclusively expressed in cardiac muscle tissues. Mutations in this gene cause familial hypertrophic cardiomyopathy type 7 (CMH7) and familial restrictive cardiomyopathy (RCM).
LMF1 64788 ENSG00000260807 lipase maturation factor 1 The protein encoded by this gene resides in the endoplasmic reticulum, and is involved in the maturation and transport of lipoprotein lipase through the secretory pathway. Mutations in this gene are associated with combined lipase deficiency. Alternatively spliced transcript variants have been found for this gene.
DBNDD1 79007 ENSG00000003249 dysbindin (dystrobrevin binding protein 1) domain containing 1 NA
FGA 2243 ENSG00000171560 fibrinogen alpha chain This gene encodes the alpha subunit of the coagulation factor fibrinogen, which is a component of the blood clot. Following vascular injury, the encoded preproprotein is proteolytically processed by thrombin during the conversion of fibrinogen to fibrin. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia, afibrinogenemia and renal amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing.
SOX15 6665 ENSG00000129194 SRY-box 15 This gene encodes a member of the SOX (SRY-related HMG-box) family of transcription factors involved in the regulation of embryonic development and in the determination of the cell fate. The encoded protein may act as a transcriptional regulator after forming a protein complex with other proteins.
AHNAK2 113146 ENSG00000185567 AHNAK nucleoprotein 2 NA
MTND6P4 ENSG00000249119 ENSG00000249119 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 6 pseudogene 4 NA
BOK 666 ENSG00000176720 BCL2-related ovarian killer The protein encoded by this gene belongs to the BCL2 family, members of which form homo- or heterodimers, and act as anti- or proapoptotic regulators that are involved in a wide variety of cellular processes. Studies in rat show that this protein has restricted expression in reproductive tissues, interacts strongly with some antiapoptotic BCL2 proteins, not at all with proapoptotic BCL2 proteins, and induces apoptosis in transfected cells. Thus, this protein represents a proapoptotic member of the BCL2 family.
FABP5P7 ENSG00000234964 ENSG00000234964 fatty acid binding protein 5 pseudogene 7 NA
COL4A4 1286 ENSG00000081052 collagen type IV alpha 4 chain This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. This particular collagen IV subunit, however, is only found in a subset of basement membranes. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. Mutations in this gene are associated with type II autosomal recessive Alport syndrome (hereditary glomerulonephropathy) and with familial benign hematuria (thin basement membrane disease). Two transcripts, differing only in their transcription start sites, have been identified for this gene and, as is common for collagen genes, multiple polyadenylation sites are found in the 3’ UTR.
PHYHIP 9796 ENSG00000168490 phytanoyl-CoA 2-hydroxylase interacting protein NA
RP11-732A19.5 ENSG00000255390 ENSG00000255390 NA NA
APOL4 80832 ENSG00000100336 apolipoprotein L4 The protein encoded by this gene is a member of the apolipoprotein L family and may play a role in lipid exchange and transport throughout the body, as well as in reverse cholesterol transport from peripheral cells to the liver. Two transcript variants encoding two different isoforms have been found for this gene. Only one of the isoforms appears to be a secreted protein.
MMP23B 8510 ENSG00000189409 matrix metallopeptidase 23B This gene (MMP23B) encodes a member of the matrix metalloproteinase (MMP) family, and it is part of a duplicated region of chromosome 1p36.3. Proteins of the matrix metalloproteinase (MMP) family are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and metastasis. This gene belongs to the more telomeric copy of the duplicated region.
OXCT2P1 ENSG00000237624 ENSG00000237624 3-oxoacid CoA-transferase 2 pseudogene 1 NA
DCHS1 8642 ENSG00000166341 dachsous cadherin-related 1 This gene is a member of the cadherin superfamily whose members encode calcium-dependent cell-cell adhesion molecules. The encoded protein has a signal peptide, 27 cadherin repeat domains and a unique cytoplasmic region. This particular cadherin family member is expressed in fibroblasts but not in melanocytes or keratinocytes. The cell-cell adhesion of fibroblasts is thought to be necessary for wound healing.
MXD1 4084 ENSG00000059728 MAX dimerization protein 1 This gene encodes a member of the MYC/MAX/MAD network of basic helix-loop-helix leucine zipper transcription factors. The MYC/MAX/MAD transcription factors mediate cellular proliferation, differentiation and apoptosis. The encoded protein antagonizes MYC-mediated transcriptional activation of target genes by competing for the binding partner MAX and recruiting repressor complexes containing histone deacetylases. Mutations in this gene may play a role in acute leukemia, and the encoded protein is a potential tumor suppressor. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene.
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",18,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 19 Annotations

out <- mygene::queryMany(gene_list[19,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name X_id summary symbol query notfound
regulator of G-protein signaling 1 5996 This gene encodes a member of the regulator of G-protein signalling family. This protein is located on the cytosolic side of the plasma membrane and contains a conserved, 120 amino acid motif called the RGS domain. The protein attenuates the signalling activity of G-proteins by binding to activated, GTP-bound G alpha subunits and acting as a GTPase activating protein (GAP), increasing the rate of conversion of the GTP to GDP. This hydrolysis allows the G alpha subunits to bind G beta/gamma subunit heterodimers, forming inactive G-protein heterotrimers, thereby terminating the signal. RGS1 ENSG00000090104 NA
complement component 7 730 C7 is a component of the complement system. It participates in the formation of Membrane Attack Complex (MAC). People with C7 deficiency are prone to bacterial infection. C7 ENSG00000112936 NA
indolethylamine N-methyltransferase 11185 N-methylation of endogenous and xenobiotic compounds is a major method by which they are degraded. This gene encodes an enzyme that N-methylates indoles such as tryptamine. Alternative splicing results in multiple transcript variants. Read-through transcription also exists between this gene and the downstream FAM188B (family with sequence similarity 188, member B) gene. INMT ENSG00000241644 NA
calponin 1 1264 NA CNN1 ENSG00000130176 NA
myelin protein zero like 2 10205 Thymus development depends on a complex series of interactions between thymocytes and the stromal component of the organ. Epithelial V-like antigen (EVA) is expressed in thymus epithelium and strongly downregulated by thymocyte developmental progression. This gene is expressed in the thymus and in several epithelial structures early in embryogenesis. It is highly homologous to the myelin protein zero and, in thymus-derived epithelial cell lines, is poorly soluble in nonionic detergents, strongly suggesting an association to the cytoskeleton. Its capacity to mediate cell adhesion through a homophilic interaction and its selective regulation by T cell maturation might imply the participation of EVA in the earliest phases of thymus organogenesis. The protein bears a characteristic V-type domain and two potential N-glycosylation sites in the extracellular domain; a putative serine phosphorylation site for casein kinase 2 is also present in the cytoplasmic tail. Two transcript variants encoding the same protein have been found for this gene. MPZL2 ENSG00000149573 NA
tryptase alpha/beta 1 7177 Tryptases comprise a family of trypsin-like serine proteases, the peptidase family S1. Tryptases are enzymatically active only as heparin-stabilized tetramers, and they are resistant to all known endogenous proteinase inhibitors. Several tryptase genes are clustered on chromosome 16p13.3. These genes are characterized by several distinct features. They have a highly conserved 3’ UTR and contain tandem repeat sequences at the 5’ flank and 3’ UTR which are thought to play a role in regulation of the mRNA stability. These genes have an intron immediately upstream of the initiator Met codon, which separates the site of transcription initiation from protein coding sequence. This feature is characteristic of tryptases but is unusual in other genes. The alleles of this gene exhibit an unusual amount of sequence variation, such that the alleles were once thought to represent two separate genes, alpha and beta 1. Beta tryptases appear to be the main isoenzymes expressed in mast cells; whereas in basophils, alpha tryptases predominate. Tryptases have been implicated as mediators in the pathogenesis of asthma and other allergic and inflammatory disorders. TPSAB1 ENSG00000172236 NA
NA ENSG00000263065 NA AF001548.6 ENSG00000263065 NA
solute carrier organic anion transporter family member 2A1 6578 This gene encodes a prostaglandin transporter that is a member of the 12-membrane-spanning superfamily of transporters. The encoded protein may be involved in mediating the uptake and clearance of prostaglandins in numerous tissues. SLCO2A1 ENSG00000174640 NA
actin, gamma 2, smooth muscle, enteric 72 Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. ACTG2 ENSG00000163017 NA
myosin, heavy chain 11, smooth muscle 4629 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. MYH11 ENSG00000133392 NA
C-C motif chemokine ligand 21 6366 This antimicrobial gene is one of several CC cytokine genes clustered on the p-arm of chromosome 9. Cytokines are a family of secreted proteins involved in immunoregulatory and inflammatory processes. The CC cytokines are proteins characterized by two adjacent cysteines. Similar to other chemokines the protein encoded by this gene inhibits hemopoiesis and stimulates chemotaxis. This protein is chemotactic in vitro for thymocytes and activated T cells, but not for B cells, macrophages, or neutrophils. The cytokine encoded by this gene may also play a role in mediating homing of lymphocytes to secondary lymphoid organs. It is a high affinity functional ligand for chemokine receptor 7 that is expressed on T and B lymphocytes and a known receptor for another member of the cytokine family (small inducible cytokine A19). CCL21 ENSG00000137077 NA
C-C motif chemokine ligand 19 6363 This antimicrobial gene is one of several CC cytokine genes clustered on the p-arm of chromosome 9. Cytokines are a family of secreted proteins involved in immunoregulatory and inflammatory processes. The CC cytokines are proteins characterized by two adjacent cysteines. The cytokine encoded by this gene may play a role in normal lymphocyte recirculation and homing. It also plays an important role in trafficking of T cells in thymus, and in T cell and B cell migration to secondary lymphoid organs. It specifically binds to chemokine receptor CCR7. CCL19 ENSG00000172724 NA
charged multivesicular body protein 4C 92421 CHMP4C belongs to the chromatin-modifying protein/charged multivesicular body protein (CHMP) family. These proteins are components of ESCRT-III (endosomal sorting complex required for transport III), a complex involved in degradation of surface receptor proteins and formation of endocytic multivesicular bodies (MVBs). Some CHMPs have both nuclear and cytoplasmic/vesicular distributions, and one such CHMP, CHMP1A (MIM 164010), is required for both MVB formation and regulation of cell cycle progression (Tsang et al., 2006 [PubMed 16730941]). CHMP4C ENSG00000164695 NA
mucin 1, cell surface associated 4582 This gene encodes a membrane-bound protein that is a member of the mucin family. Mucins are O-glycosylated proteins that play an essential role in forming protective mucous barriers on epithelial surfaces. These proteins also play a role in intracellular signaling. This protein is expressed on the apical surface of epithelial cells that line the mucosal surfaces of many different tissues including lung, breast stomach and pancreas. This protein is proteolytically cleaved into alpha and beta subunits that form a heterodimeric complex. The N-terminal alpha subunit functions in cell-adhesion and the C-terminal beta subunit is involved in cell signaling. Overexpression, aberrant intracellular localization, and changes in glycosylation of this protein have been associated with carcinomas. This gene is known to contain a highly polymorphic variable number tandem repeats (VNTR) domain. Alternate splicing results in multiple transcript variants. MUC1 ENSG00000185499 NA
gap junction protein beta 2 2706 This gene encodes a member of the gap junction protein family. The gap junctions were first characterized by electron microscopy as regionally specialized structures on plasma membranes of contacting adherent cells. These structures were shown to consist of cell-to-cell channels that facilitate the transfer of ions and small molecules between cells. The gap junction proteins, also known as connexins, purified from fractions of enriched gap junctions from different tissues differ. According to sequence similarities at the nucleotide and amino acid levels, the gap junction proteins are divided into two categories, alpha and beta. Mutations in this gene are responsible for as much as 50% of pre-lingual, recessive deafness. GJB2 ENSG00000165474 NA
osteoglycin 4969 This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family of proteins. The encoded protein induces ectopic bone formation in conjunction with transforming growth factor beta and may regulate osteoblast differentiation. High expression of the encoded protein may be associated with elevated heart left ventricular mass. Alternative splicing results in multiple transcript variants. OGN ENSG00000106809 NA
NA NA NA NA ENSG00000259716 TRUE
apolipoprotein L4 80832 The protein encoded by this gene is a member of the apolipoprotein L family and may play a role in lipid exchange and transport throughout the body, as well as in reverse cholesterol transport from peripheral cells to the liver. Two transcript variants encoding two different isoforms have been found for this gene. Only one of the isoforms appears to be a secreted protein. APOL4 ENSG00000100336 NA
myosin light chain 9 10398 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. MYL9 ENSG00000101335 NA
cadherin EGF LAG seven-pass G-type receptor 1 9620 The protein encoded by this gene is a member of the flamingo subfamily, part of the cadherin superfamily. The flamingo subfamily consists of nonclassic-type cadherins; a subpopulation that does not interact with catenins. The flamingo cadherins are located at the plasma membrane and have nine cadherin domains, seven epidermal growth factor-like repeats and two laminin A G-type repeats in their ectodomain. They also have seven transmembrane domains, a characteristic unique to this subfamily. It is postulated that these proteins are receptors involved in contact-mediated communication, with cadherin domains acting as homophilic binding regions and the EGF-like domains involved in cell adhesion and receptor-ligand interactions. This particular member is a developmentally regulated, neural-specific gene which plays an unspecified role in early embryogenesis. CELSR1 ENSG00000075275 NA
NA NA NA NA ENSG00000187990 TRUE
keratin 8 3856 This gene is a member of the type II keratin family clustered on the long arm of chromosome 12. Type I and type II keratins heteropolymerize to form intermediate-sized filaments in the cytoplasm of epithelial cells. The product of this gene typically dimerizes with keratin 18 to form an intermediate filament in simple single-layered epithelial cells. This protein plays a role in maintaining cellular structural integrity and also functions in signal transduction and cellular differentiation. Mutations in this gene cause cryptogenic cirrhosis. Alternatively spliced transcript variants have been found for this gene. KRT8 ENSG00000170421 NA
myocilin 4653 MYOC encodes the protein myocilin, which is believed to have a role in cytoskeletal function. MYOC is expressed in many occular tissues, including the trabecular meshwork, and was revealed to be the trabecular meshwork glucocorticoid-inducible response protein (TIGR). The trabecular meshwork is a specialized eye tissue essential in regulating intraocular pressure, and mutations in MYOC have been identified as the cause of hereditary juvenile-onset open-angle glaucoma. MYOC ENSG00000034971 NA
NA NA NA NA ENSG00000180672 TRUE
nephronectin 255743 NA NPNT ENSG00000168743 NA
NA ENSG00000269936 NA RP11-394O4.5 ENSG00000269936 NA
actin, alpha 2, smooth muscle, aorta 59 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. ACTA2 ENSG00000107796 NA
latent transforming growth factor beta binding protein 4 8425 The protein encoded by this gene binds transforming growth factor beta (TGFB) as it is secreted and targeted to the extracellular matrix. TGFB is biologically latent after secretion and insertion into the extracellular matrix, and sheds TGFB and other proteins upon activation. Defects in this gene may be a cause of cutis laxa and severe pulmonary, gastrointestinal, and urinary abnormalities. Three transcript variants encoding different isoforms have been found for this gene. LTBP4 ENSG00000090006 NA
lipin 3 64900 The protein encoded by this gene is a member of the lipin family of proteins, and all family members share strong homology in their C-terminal region. This protein is thought to form hetero-oligomers with other lipin family members, while one family member, lipin 1, can also form homo-oligomers. This protein contains conserved motifs for phosphatidate phosphatase 1 (PAP1) activity as well as a domain that interacts with a transcriptional co-activator. Lipin complexes act in the cytoplasm to catalyze the dephosphorylation of phosphatidic acid to produce diacylglycerol, which is the precursor of both triglycerides and phospholipids. Lipin complexes are also thought to regulate gene expression as transcriptional co-activators in the nucleus. Alternative splicing results in multiple transcript variants. LPIN3 ENSG00000132793 NA
LY6/PLAUR domain containing 3 27076 NA LYPD3 ENSG00000124466 NA
ACTA2 antisense RNA 1 ENSG00000180139 NA ACTA2-AS1 ENSG00000180139 NA
aquaporin 3 (Gill blood group) 360 This gene encodes the water channel protein aquaporin 3. Aquaporins are a family of small integral membrane proteins related to the major intrinsic protein, also known as aquaporin 0. Aquaporin 3 is localized at the basal lateral membranes of collecting duct cells in the kidney. In addition to its water channel function, aquaporin 3 has been found to facilitate the transport of nonionic small solutes such as urea and glycerol, but to a smaller degree. It has been suggested that water channels can be functionally heterogeneous and possess water and solute permeation mechanisms. Alternative splicing of this gene results in multiple transcript variants encoding different isoforms. AQP3 ENSG00000165272 NA
carboxypeptidase X (M14 family), member 2 119587 NA CPXM2 ENSG00000121898 NA
kinesin family member 23 9493 The protein encoded by this gene is a member of kinesin-like protein family. This family includes microtubule-dependent molecular motors that transport organelles within cells and move chromosomes during cell division. This protein has been shown to cross-bridge antiparallel microtubules and drive microtubule movement in vitro. Alternate splicing of this gene results in multiple transcript variants. KIF23 ENSG00000137807 NA
NA ENSG00000232993 NA RP11-334A14.5 ENSG00000232993 NA
netrin 1 9423 Netrin is included in a family of laminin-related secreted proteins. The function of this gene has not yet been defined; however, netrin is thought to be involved in axon guidance and cell migration during development. Mutations and loss of expression of netrin suggest that variation in netrin may be involved in cancer development. NTN1 ENSG00000065320 NA
prolyl 3-hydroxylase 2 55214 This gene encodes a member of the prolyl 3-hydroxylase subfamily of 2-oxo-glutarate-dependent dioxygenases. These enzymes play a critical role in collagen chain assembly, stability and cross-linking by catalyzing post-translational 3-hydroxylation of proline residues. Mutations in this gene are associated with nonsyndromic severe myopia with cataract and vitreoretinal degeneration, and downregulation of this gene may play a role in breast cancer. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. P3H2 ENSG00000090530 NA
interleukin 1 receptor antagonist 3557 The protein encoded by this gene is a member of the interleukin 1 cytokine family. This protein inhibits the activities of interleukin 1, alpha (IL1A) and interleukin 1, beta (IL1B), and modulates a variety of interleukin 1 related immune and inflammatory responses. This gene and five other closely related cytokine genes form a gene cluster spanning approximately 400 kb on chromosome 2. A polymorphism of this gene is reported to be associated with increased risk of osteoporotic fractures and gastric cancer. Several alternatively spliced transcript variants encoding distinct isoforms have been reported. IL1RN ENSG00000136689 NA
plasmalemma vesicle associated protein 83483 NA PLVAP ENSG00000130300 NA
family with sequence similarity 46 member B 115572 NA FAM46B ENSG00000158246 NA
NA ENSG00000263335 NA AF001548.5 ENSG00000263335 NA
cingulin-like 1 84952 This gene encodes a member of the cingulin family. The encoded protein localizes to both adherens and tight cell-cell junctions and mediates junction assembly and maintenance by regulating the activity of the small GTPases RhoA and Rac1. Heterozygous chromosomal rearrangements resulting in association of the promoter for this gene with the aromatase gene are a cause of aromatase excess syndrome. Alternatively spliced transcript variants have been observed for this gene. CGNL1 ENSG00000128849 NA
RNA binding protein with multiple splicing 2 348093 NA RBPMS2 ENSG00000166831 NA
NA ENSG00000249007 NA RP11-510N19.5 ENSG00000249007 NA
proline and arginine rich end leucine rich repeat protein 5549 The protein encoded by this gene is a leucine-rich repeat protein present in connective tissue extracellular matrix. This protein functions as a molecule anchoring basement membranes to the underlying connective tissue. This protein has been shown to bind type I collagen to basement membranes and type II collagen to cartilage. It also binds the basement membrane heparan sulfate proteoglycan perlecan. This protein is suggested to be involved in the pathogenesis of Hutchinson-Gilford progeria (HGP), which is reported to lack the binding of collagen in basement membranes and cartilage. Alternatively spliced transcript variants encoding the same protein have been observed. PRELP ENSG00000188783 NA
HOXA transcript antisense RNA, myeloid-specific 1 ENSG00000233429 NA HOTAIRM1 ENSG00000233429 NA
sosondowah ankyrin repeat domain family member C 65124 NA SOWAHC ENSG00000198142 NA
NA ENSG00000253520 NA RP11-798K23.5 ENSG00000253520 NA
potassium two pore domain channel subfamily K member 6 9424 This gene encodes one of the members of the superfamily of potassium channel proteins containing two pore-forming P domains. This channel protein, considered an open rectifier, is widely expressed. It is stimulated by arachidonic acid, and inhibited by internal acidification and volatile anaesthetics. KCNK6 ENSG00000099337 NA
colorectal cancer associated 2 120376 NA COLCA2 ENSG00000214290 NA
cytochrome P450 family 2 subfamily S member 1 29785 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. In rodents, the homologous protein has been shown to metabolize certain carcinogens; however, the specific function of the human protein has not been determined. CYP2S1 ENSG00000167600 NA
lumican 4060 This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family that includes decorin, biglycan, fibromodulin, keratocan, epiphycan, and osteoglycin. In these bifunctional molecules, the protein moiety binds collagen fibrils and the highly charged hydrophilic glycosaminoglycans regulate interfibrillar spacings. Lumican is the major keratan sulfate proteoglycan of the cornea but is also distributed in interstitial collagenous matrices throughout the body. Lumican may regulate collagen fibril organization and circumferential growth, corneal transparency, and epithelial cell migration and tissue repair. LUM ENSG00000139329 NA
NOTCH1 associated lncRNA in T-cell acute lymphoblastic leukemia 1 ENSG00000237886 NA NALT1 ENSG00000237886 NA
NA ENSG00000271133 NA CTA-293F17.1 ENSG00000271133 NA
secreted frizzled related protein 2 6423 This gene encodes a member of the SFRP family that contains a cysteine-rich domain homologous to the putative Wnt-binding site of Frizzled proteins. SFRPs act as soluble modulators of Wnt signaling. Methylation of this gene is a potential marker for the presence of colorectal cancer. SFRP2 ENSG00000145423 NA
matrix Gla protein 4256 The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. MGP ENSG00000111341 NA
small nucleolar RNA host gene 18 ENSG00000250786 NA SNHG18 ENSG00000250786 NA
alcohol dehydrogenase 1B (class I), beta polypeptide 125 The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. ADH1B ENSG00000196616 NA
purinergic receptor P2Y1 5028 The product of this gene belongs to the family of G-protein coupled receptors. This family has several receptor subtypes with different pharmacological selectivity, which overlaps in some cases, for various adenosine and uridine nucleotides. This receptor functions as a receptor for extracellular ATP and ADP. In platelets binding to ADP leads to mobilization of intracellular calcium ions via activation of phospholipase C, a change in platelet shape, and probably to platelet aggregation. P2RY1 ENSG00000169860 NA
NA NA NA NA ENSG00000268913 TRUE
CDC42 effector protein 5 148170 Cell division control protein 42 (CDC42), a small Rho GTPase, regulates the formation of F-actin-containing structures through its interaction with the downstream effector proteins. The protein encoded by this gene is a member of the Borg (binder of Rho GTPases) family of CDC42 effector proteins. Borg family proteins contain a CRIB (Cdc42/Rac interactive-binding) domain. They bind to CDC42 and regulate its function negatively. The encoded protein may inhibit c-Jun N-terminal kinase (JNK) independently of CDC42 binding. The protein may also play a role in septin organization and inducing pseudopodia formation in fibroblasts CDC42EP5 ENSG00000167617 NA
tumor-associated calcium signal transducer 2 4070 This intronless gene encodes a carcinoma-associated antigen. This antigen is a cell surface receptor that transduces calcium signals. Mutations of this gene have been associated with gelatinous drop-like corneal dystrophy. TACSTD2 ENSG00000184292 NA
baculoviral IAP repeat containing 3 330 This gene encodes a member of the IAP family of proteins that inhibit apoptosis by binding to tumor necrosis factor receptor-associated factors TRAF1 and TRAF2, probably by interfering with activation of ICE-like proteases. The encoded protein inhibits apoptosis induced by serum deprivation but does not affect apoptosis resulting from exposure to menadione, a potent inducer of free radicals. It contains 3 baculovirus IAP repeats and a ring finger domain. Transcript variants encoding the same isoform have been identified. BIRC3 ENSG00000023445 NA
phospholipase A2 group V 5322 This gene is a member of the secretory phospholipase A2 family. It is located in a tightly-linked cluster of secretory phospholipase A2 genes on chromosome 1. The encoded enzyme catalyzes the hydrolysis of membrane phospholipids to generate lysophospholipids and free fatty acids including arachidonic acid. It preferentially hydrolyzes linoleoyl-containing phosphatidylcholine substrates. Secretion of this enzyme is thought to induce inflammatory responses in neighboring cells. Alternatively spliced transcript variants have been found, but their full-length nature has not been determined. PLA2G5 ENSG00000127472 NA
podocan 127435 NA PODN ENSG00000174348 NA
desmoplakin 1832 This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants. DSP ENSG00000096696 NA
erythrocyte membrane protein band 4.1 like 4A 64097 Members of the band 4.1 protein superfamily, including EPB41L4A, are thought to regulate the interaction between the cytoskeleton and plasma membrane (Ishiguro et al., 2000 [PubMed 10874211]). EPB41L4A ENSG00000129595 NA
pleckstrin homology domain containing A4 57664 NA PLEKHA4 ENSG00000105559 NA
EPH receptor A2 1969 This gene belongs to the ephrin receptor subfamily of the protein-tyrosine kinase family. EPH and EPH-related receptors have been implicated in mediating developmental events, particularly in the nervous system. Receptors in the EPH subfamily typically have a single kinase domain and an extracellular region containing a Cys-rich domain and 2 fibronectin type III repeats. The ephrin receptors are divided into 2 groups based on the similarity of their extracellular domain sequences and their affinities for binding ephrin-A and ephrin-B ligands. This gene encodes a protein that binds ephrin-A ligands. Mutations in this gene are the cause of certain genetically-related cataract disorders. EPHA2 ENSG00000142627 NA
cell adhesion molecule L1 like 10752 The protein encoded by this gene is a member of the L1 gene family of neural cell adhesion molecules. It is a neural recognition molecule that may be involved in signal transduction pathways. The deletion of one copy of this gene may be responsible for mental defects in patients with 3p- syndrome. This protein may also play a role in the growth of certain cancers. Alternate splicing results in both coding and non-coding variants. CHL1 ENSG00000134121 NA
acid phosphatase 5, tartrate resistant 54 This gene encodes an iron containing glycoprotein which catalyzes the conversion of orthophosphoric monoester to alcohol and orthophosphate. It is the most basic of the acid phosphatases and is the only form not inhibited by L(+)-tartrate. ACP5 ENSG00000102575 NA
phospholipase A2 group IIA 5320 The protein encoded by this gene is a member of the phospholipase A2 family (PLA2). PLA2s constitute a diverse family of enzymes with respect to sequence, function, localization, and divalent cation requirements. This gene product belongs to group II, which contains secreted form of PLA2, an extracellular enzyme that has a low molecular mass and requires calcium ions for catalysis. It catalyzes the hydrolysis of the sn-2 fatty acid acyl ester bond of phosphoglycerides, releasing free fatty acids and lysophospholipids, and thought to participate in the regulation of the phospholipid metabolism in biomembranes. Several alternatively spliced transcript variants with different 5’ UTRs have been found for this gene. PLA2G2A ENSG00000188257 NA
RAS like family 12 51285 NA RASL12 ENSG00000103710 NA
microfibrillar associated protein 4 4239 This gene encodes a protein with similarity to a bovine microfibril-associated protein. The protein has binding specificities for both collagen and carbohydrate. It is thought to be an extracellular matrix protein which is involved in cell adhesion or intercellular interactions. The gene is located within the Smith-Magenis syndrome region. Two transcript variants encoding different isoforms have been found for this gene. MFAP4 ENSG00000166482 NA
epithelial membrane protein 1 2012 NA EMP1 ENSG00000134531 NA
PERP, TP53 apoptosis effector 64065 NA PERP ENSG00000112378 NA
chromosome 15 open reading frame 48 84419 This gene was first identified in a study of human esophageal squamous cell carcinoma tissues. Levels of both the message and protein are reduced in carcinoma samples. In adult human tissues, this gene is expressed in the the esophagus, stomach, small intestine, colon and placenta. Alternatively spliced transcript variants that encode the same protein have been identified. C15orf48 ENSG00000166920 NA
ADIRF antisense RNA 1 ENSG00000272734 NA ADIRF-AS1 ENSG00000272734 NA
chromosome 3 open reading frame 52 79669 NA C3orf52 ENSG00000114529 NA
von Willebrand factor A domain containing 1 64856 VWA1 belongs to the von Willebrand factor (VWF; MIM 613160) A (VWFA) domain superfamily of extracellular matrix proteins and appears to play a role in cartilage structure and function (Fitzgerald et al., 2002 [PubMed 12062410]). VWA1 ENSG00000179403 NA
tumor necrosis factor receptor superfamily member 19 55504 The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptor is highly expressed during embryonic development. It has been shown to interact with TRAF family members, and to activate JNK signaling pathway when overexpressed in cells. This receptor is capable of inducing apoptosis by a caspase-independent mechanism, and it is thought to play an essential role in embryonic development. Alternatively spliced transcript variants encoding distinct isoforms have been described. TNFRSF19 ENSG00000127863 NA
chloride intracellular channel 6 54102 This gene encodes a member of the chloride intracellular channel family of proteins. The gene is part of a large triplicated region found on chromosomes 1, 6, and 21. Alternative splicing results in multiple transcript variants encoding different isoforms. CLIC6 ENSG00000159212 NA
uncharacterized LOC100506314 100506314 NA LOC100506314 ENSG00000247498 NA
insulin like growth factor binding protein 2 3485 The protein encoded by this gene is one of six similar proteins that bind insulin-like growth factors I and II (IGF-I and IGF-II). The encoded protein can be secreted into the bloodstream, where it binds IGF-I and IGF-II with high affinity, or it can remain intracellular, interacting with many different ligands. High expression levels of this protein promote the growth of several types of tumors and may be predictive of the chances of recovery of the patient. Several transcript variants, one encoding a secreted isoform and the others encoding nonsecreted isoforms, have been found for this gene. IGFBP2 ENSG00000115457 NA
aldehyde dehydrogenase 3 family member A1 218 Aldehyde dehydrogenases oxidize various aldehydes to the corresponding acids. They are involved in the detoxification of alcohol-derived acetaldehyde and in the metabolism of corticosteroids, biogenic amines, neurotransmitters, and lipid peroxidation. The enzyme encoded by this gene forms a cytoplasmic homodimer that preferentially oxidizes aromatic and medium-chain (6 carbons or more) saturated and unsaturated aldehyde substrates. It is thought to promote resistance to UV and 4-hydroxy-2-nonenal-induced oxidative damage in the cornea. The gene is located within the Smith-Magenis syndrome region on chromosome 17. Multiple alternatively spliced variants, encoding the same protein, have been identified. ALDH3A1 ENSG00000108602 NA
syntaxin 19 415117 NA STX19 ENSG00000178750 NA
phospholamban 5350 The protein encoded by this gene is found as a pentamer and is a major substrate for the cAMP-dependent protein kinase in cardiac muscle. The encoded protein is an inhibitor of cardiac muscle sarcoplasmic reticulum Ca(2+)-ATPase in the unphosphorylated state, but inhibition is relieved upon phosphorylation of the protein. The subsequent activation of the Ca(2+) pump leads to enhanced muscle relaxation rates, thereby contributing to the inotropic response elicited in heart by beta-agonists. The encoded protein is a key regulator of cardiac diastolic function. Mutations in this gene are a cause of inherited human dilated cardiomyopathy with refractory congestive heart failure, and also familial hypertrophic cardiomyopathy. PLN ENSG00000198523 NA
G protein-coupled receptor class C group 5 member A 9052 This gene encodes a member of the type 3 G protein-coupling receptor family, characterized by the signature 7-transmembrane domain motif. The encoded protein may be involved in interaction between retinoid acid and G protein signalling pathways. Retinoic acid plays a critical role in development, cellular growth, and differentiation. This gene may play a role in embryonic development and epithelial cell differentiation. GPRC5A ENSG00000013588 NA
retinoic acid receptor responder 1 5918 This gene was identified as a retinoid acid (RA) receptor-responsive gene. It encodes a type 1 membrane protein. The expression of this gene is upregulated by tazarotene as well as by retinoic acid receptors. The expression of this gene is found to be downregulated in prostate cancer, which is caused by the methylation of its promoter and CpG island. Alternatively spliced transcript variant encoding distinct isoforms have been observed. RARRES1 ENSG00000118849 NA
small cell adhesion glycoprotein 57228 NA SMAGP ENSG00000170545 NA
collagen type III alpha 1 chain 1281 This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. COL3A1 ENSG00000168542 NA
stratifin 2810 NA SFN ENSG00000175793 NA
tandem C2 domains, nuclear 123036 NA TC2N ENSG00000165929 NA
cyclin D1 595 The protein encoded by this gene belongs to the highly conserved cyclin family, whose members are characterized by a dramatic periodicity in protein abundance throughout the cell cycle. Cyclins function as regulators of CDK kinases. Different cyclins exhibit distinct expression and degradation patterns which contribute to the temporal coordination of each mitotic event. This cyclin forms a complex with and functions as a regulatory subunit of CDK4 or CDK6, whose activity is required for cell cycle G1/S transition. This protein has been shown to interact with tumor suppressor protein Rb and the expression of this gene is regulated positively by Rb. Mutations, amplification and overexpression of this gene, which alters cell cycle progression, are observed frequently in a variety of tumors and may contribute to tumorigenesis. CCND1 ENSG00000110092 NA
laminin subunit alpha 3 3909 The protein encoded by this gene belongs to the laminin family of secreted molecules. Laminins are heterotrimeric molecules that consist of alpha, beta, and gamma subunits that assemble through a coiled-coil domain. Laminins are essential for formation and function of the basement membrane and have additional functions in regulating cell migration and mechanical signal transduction. This gene encodes an alpha subunit and is responsive to several epithelial-mesenchymal regulators including keratinocyte growth factor, epidermal growth factor and insulin-like growth factor. Mutations in this gene have been identified as the cause of Herlitz type junctional epidermolysis bullosa and laryngoonychocutaneous syndrome. Alternative splicing and alternative promoter usage result in multiple transcript variants. LAMA3 ENSG00000053747 NA
superoxide dismutase 3, extracellular 6649 This gene encodes a member of the superoxide dismutase (SOD) protein family. SODs are antioxidant enzymes that catalyze the conversion of superoxide radicals into hydrogen peroxide and oxygen, which may protect the brain, lungs, and other tissues from oxidative stress. Proteolytic processing of the encoded protein results in the formation of two distinct homotetramers that differ in their ability to interact with the extracellular matrix (ECM). Homotetramers consisting of the intact protein, or type C subunit, exhibit high affinity for heparin and are anchored to the ECM. Homotetramers consisting of a proteolytically cleaved form of the protein, or type A subunit, exhibit low affinity for heparin and do not interact with the ECM. A mutation in this gene may be associated with increased heart disease risk. SOD3 ENSG00000109610 NA
EPS8 like 1 54869 This gene encodes a protein that is related to epidermal growth factor receptor pathway substrate 8 (EPS8), a substrate for the epidermal growth factor receptor. The function of this protein is unknown. At least two alternatively spliced transcript variants encoding different isoforms have been found for this gene. EPS8L1 ENSG00000131037 NA
EvC ciliary complex subunit 1 2121 This gene encodes a protein containing a leucine zipper and a transmembrane domain. This gene has been implicated in both Ellis-van Creveld syndrome (EvC) and Weyers acrodental dysostosis. EVC ENSG00000072840 NA
retinoic acid receptor responder 2 5919 This gene encodes a secreted chemotactic protein that initiates chemotaxis via the ChemR23 G protein-coupled seven-transmembrane domain ligand. Expression of this gene is upregulated by the synthetic retinoid tazarotene and occurs in a wide variety of tissues. The active protein has several roles, including that as an adipokine and as an antimicrobial protein with activity against bacteria and fungi. RARRES2 ENSG00000106538 NA
leucine rich repeat containing 3 81543 NA LRRC3 ENSG00000160233 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",19,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 20 Annotations

out <- mygene::queryMany(gene_list[20,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol query summary name X_id notfound
MAL ENSG00000172005 The protein encoded by this gene is a highly hydrophobic integral membrane protein belonging to the MAL family of proteolipids. The protein has been localized to the endoplasmic reticulum of T-cells and is a candidate linker protein in T-cell signal transduction. In addition, this proteolipid is localized in compact myelin of cells in the nervous system and has been implicated in myelin biogenesis and/or function. The protein plays a role in the formation, stabilization and maintenance of glycosphingolipid-enriched membrane microdomains. Down-regulation of this gene has been associated with a variety of human epithelial malignancies. Alternative splicing produces four transcript variants which vary from each other by the presence or absence of alternatively spliced exons 2 and 3. mal, T-cell differentiation protein 4118 NA
CLIC3 ENSG00000169583 Chloride channels are a diverse group of proteins that regulate fundamental cellular processes including stabilization of cell membrane potential, transepithelial transport, maintenance of intracellular pH, and regulation of cell volume. Chloride intracellular channel 3 is a member of the p64 family and is predominantly localized in the nucleus and stimulates chloride ion channel activity. In addition, this protein may participate in cellular growth control, based on its association with ERK7, a member of the MAP kinase family. chloride intracellular channel 3 9022 NA
AATK ENSG00000181409 The protein encoded by this gene contains a tyrosine kinase domain at the N-terminus and a proline-rich domain at the C-terminus. This gene is induced during apoptosis, and expression of this gene may be a necessary pre-requisite for the induction of growth arrest and/or apoptosis of myeloid precursor cells. This gene has been shown to produce neuronal differentiation in a neuroblastoma cell line. Two transcript variants encoding different isoforms have been found for this gene. apoptosis-associated tyrosine kinase 9625 NA
LYZ ENSG00000090382 This gene encodes human lysozyme, whose natural substrate is the bacterial cell wall peptidoglycan (cleaving the beta[1-4]glycosidic linkages between N-acetylmuramic acid and N-acetylglucosamine). Lysozyme is one of the antimicrobial agents found in human milk, and is also present in spleen, lung, kidney, white blood cells, plasma, saliva, and tears. The protein has antibacterial activity against a number of bacterial species. Missense mutations in this gene have been identified in heritable renal amyloidosis. lysozyme 4069 NA
RNASE2 ENSG00000169385 The protein encoded by this gene is a non-secretory ribonuclease that belongs to the pancreatic ribonuclease family, a subset of the ribonuclease A superfamily. The protein antimicrobial activity against viruses. ribonuclease A family member 2 6036 NA
SMIM5 ENSG00000204323 NA small integral membrane protein 5 643008 NA
RP11-1143G9.4 ENSG00000257764 NA NA ENSG00000257764 NA
CDA ENSG00000158825 This gene encodes an enzyme involved in pyrimidine salvaging. The encoded protein forms a homotetramer that catalyzes the irreversible hydrolytic deamination of cytidine and deoxycytidine to uridine and deoxyuridine, respectively. It is one of several deaminases responsible for maintaining the cellular pyrimidine pool. Mutations in this gene are associated with decreased sensitivity to the cytosine nucleoside analogue cytosine arabinoside used in the treatment of certain childhood leukemias. cytidine deaminase 978 NA
PDLIM4 ENSG00000131435 This gene encodes a protein which may be involved in bone development. Mutations in this gene are associated with susceptibility to osteoporosis. PDZ and LIM domain 4 8572 NA
CHAC1 ENSG00000128965 NA ChaC glutathione specific gamma-glutamylcyclotransferase 1 79094 NA
CDC42EP5 ENSG00000167617 Cell division control protein 42 (CDC42), a small Rho GTPase, regulates the formation of F-actin-containing structures through its interaction with the downstream effector proteins. The protein encoded by this gene is a member of the Borg (binder of Rho GTPases) family of CDC42 effector proteins. Borg family proteins contain a CRIB (Cdc42/Rac interactive-binding) domain. They bind to CDC42 and regulate its function negatively. The encoded protein may inhibit c-Jun N-terminal kinase (JNK) independently of CDC42 binding. The protein may also play a role in septin organization and inducing pseudopodia formation in fibroblasts CDC42 effector protein 5 148170 NA
LIF ENSG00000128342 The protein encoded by this gene is a pleiotropic cytokine with roles in several different systems. It is involved in the induction of hematopoietic differentiation in normal and myeloid leukemia cells, induction of neuronal cell differentiation, regulator of mesenchymal to epithelial conversion during kidney development, and may also have a role in immune tolerance at the maternal-fetal interface. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. leukemia inhibitory factor 3976 NA
COL1A1 ENSG00000108821 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. collagen type I alpha 1 1277 NA
FAM83D ENSG00000101447 NA family with sequence similarity 83 member D 81610 NA
AEBP1 ENSG00000106624 This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. AE binding protein 1 165 NA
TUBB6 ENSG00000176014 NA tubulin beta 6 class V 84617 NA
TMEM52 ENSG00000178821 NA transmembrane protein 52 339456 NA
IGSF6 ENSG00000140749 NA immunoglobulin superfamily member 6 10261 NA
CGA ENSG00000135346 The four human glycoprotein hormones chorionic gonadotropin (CG), luteinizing hormone (LH), follicle stimulating hormone (FSH), and thyroid stimulating hormone (TSH) are dimers consisting of alpha and beta subunits that are associated noncovalently. The alpha subunits of these hormones are identical, however, their beta chains are unique and confer biological specificity. The protein encoded by this gene is the alpha subunit and belongs to the glycoprotein hormones alpha chain family. Two transcript variants encoding different isoforms have been found for this gene. glycoprotein hormones, alpha polypeptide 1081 NA
RAP1GAP ENSG00000076864 This gene encodes a type of GTPase-activating-protein (GAP) that down-regulates the activity of the ras-related RAP1 protein. RAP1 acts as a molecular switch by cycling between an inactive GDP-bound form and an active GTP-bound form. The product of this gene, RAP1GAP, promotes the hydrolysis of bound GTP and hence returns RAP1 to the inactive state whereas other proteins, guanine nucleotide exchange factors (GEFs), act as RAP1 activators by facilitating the conversion of RAP1 from the GDP- to the GTP-bound form. In general, ras subfamily proteins, such as RAP1, play key roles in receptor-linked signaling pathways that control cell growth and differentiation. RAP1 plays a role in diverse processes such as cell proliferation, adhesion, differentiation, and embryogenesis. Alternative splicing results in multiple transcript variants encoding distinct proteins. RAP1 GTPase activating protein 5909 NA
ATP8B1 ENSG00000081923 This gene encodes a member of the P-type cation transport ATPase family, which belongs to the subfamily of aminophospholipid-transporting ATPases. The aminophospholipid translocases transport phosphatidylserine and phosphatidylethanolamine from one side of a bilayer to another. Mutations in this gene may result in progressive familial intrahepatic cholestasis type 1 and in benign recurrent intrahepatic cholestasis. ATPase phospholipid transporting 8B1 5205 NA
TM4SF1 ENSG00000169908 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. This encoded protein is a cell surface antigen and is highly expressed in different carcinomas. transmembrane 4 L six family member 1 4071 NA
CSF3R ENSG00000119535 The protein encoded by this gene is the receptor for colony stimulating factor 3, a cytokine that controls the production, differentiation, and function of granulocytes. The encoded protein, which is a member of the family of cytokine receptors, may also function in some cell surface adhesion or recognition processes. Alternatively spliced transcript variants have been described. Mutations in this gene are a cause of Kostmann syndrome, also known as severe congenital neutropenia. colony stimulating factor 3 receptor 1441 NA
DPT ENSG00000143196 Dermatopontin is an extracellular matrix protein with possible functions in cell-matrix interactions and matrix assembly. The protein is found in various tissues and many of its tyrosine residues are sulphated. Dermatopontin is postulated to modify the behavior of TGF-beta through interaction with decorin. dermatopontin 1805 NA
GEM ENSG00000164949 The protein encoded by this gene belongs to the RAD/GEM family of GTP-binding proteins. It is associated with the inner face of the plasma membrane and could play a role as a regulatory protein in receptor-mediated signal transduction. Alternative splicing occurs at this locus and two transcript variants encoding the same protein have been identified. GTP binding protein overexpressed in skeletal muscle 2669 NA
A4GALT ENSG00000128274 The protein encoded by this gene catalyzes the transfer of galactose to lactosylceramide to form globotriaosylceramide, which has been identified as the P(k) antigen of the P blood group system. This protein, a type II membrane protein found in the Golgi, is also required for the synthesis of the bacterial verotoxins receptor. Alternatively spliced transcript variants have been found for this gene. alpha 1,4-galactosyltransferase 53947 NA
CD109 ENSG00000156535 This gene encodes a glycosyl phosphatidylinositol (GPI)-linked glycoprotein that localizes to the surface of platelets, activated T-cells, and endothelial cells. The protein binds to and negatively regulates signalling by transforming growth factor beta (TGF-beta). Multiple transcript variants encoding different isoforms have been found for this gene. CD109 molecule 135228 NA
TNC ENSG00000041982 This gene encodes an extracellular matrix protein with a spatially and temporally restricted tissue distribution. This protein is homohexameric with disulfide-linked subunits, and contains multiple EGF-like and fibronectin type-III domains. It is implicated in guidance of migrating neurons as well as axons during development, synaptic plasticity, and neuronal regeneration. tenascin C 3371 NA
FBLIM1 ENSG00000162458 This gene encodes a protein with an N-terminal filamin-binding domain, a central proline-rich domain, and, multiple C-terminal LIM domains. This protein localizes at cell junctions and may link cell adhesion structures to the actin cytoskeleton. This protein may be involved in the assembly and stabilization of actin-filaments and likely plays a role in modulating cell adhesion, cell morphology and cell motility. This protein also localizes to the nucleus and may affect cardiomyocyte differentiation after binding with the CSX/NKX2-5 transcription factor. Alternative splicing results in multiple transcript variants encoding different isoforms. filamin binding LIM protein 1 54751 NA
EMILIN1 ENSG00000138080 This gene encodes an extracellular matrix glycoprotein that is characterized by an N-terminal microfibril interface domain, a coiled-coiled alpha-helical domain, a collagenous domain and a C-terminal globular C1q domain. The encoded protein associates with elastic fibers at the interface between elastin and microfibrils and may play a role in the development of elastic tissues including large blood vessels, dermis, heart and lung. elastin microfibril interfacer 1 11117 NA
PPIC ENSG00000168938 The protein encoded by this gene is a member of the peptidyl-prolyl cis-trans isomerase (PPIase)) family. PPIases catalyze the cis-trans isomerization of proline imidic peptide bonds in oligopeptides and accelerate the folding of proteins. Similar to other PPIases, this protein can bind immunosuppressant cyclosporin A. peptidylprolyl isomerase C 5480 NA
IGFBP4 ENSG00000141753 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein binds both insulin-like growth factors (IGFs) I and II and circulates in the plasma in both glycosylated and non-glycosylated forms. Binding of this protein prolongs the half-life of the IGFs and alters their interaction with cell surface receptors. insulin like growth factor binding protein 4 3487 NA
BTG3 ENSG00000154640 The protein encoded by this gene is a member of the BTG/Tob family. This family has structurally related proteins that appear to have antiproliferative properties. This encoded protein might play a role in neurogenesis in the central nervous system. Two transcript variants encoding different isoforms have been found for this gene. BTG family member 3 10950 NA
MFAP4 ENSG00000166482 This gene encodes a protein with similarity to a bovine microfibril-associated protein. The protein has binding specificities for both collagen and carbohydrate. It is thought to be an extracellular matrix protein which is involved in cell adhesion or intercellular interactions. The gene is located within the Smith-Magenis syndrome region. Two transcript variants encoding different isoforms have been found for this gene. microfibrillar associated protein 4 4239 NA
TPBG ENSG00000146242 This gene encodes a leucine-rich transmembrane glycoprotein that may be involved in cell adhesion. The encoded protein is an oncofetal antigen that is specific to trophoblast cells. In adults this protein is highly expressed in many tumor cells and is associated with poor clinical outcome in numerous cancers. Alternate splicing in the 5’ UTR results in multiple transcript variants that encode the same protein. trophoblast glycoprotein 7162 NA
PHLDA2 ENSG00000181649 This gene is located in a cluster of imprinted genes on chromosome 11p15.5, which is considered to be an important tumor suppressor gene region. Alterations in this region may be associated with the Beckwith-Wiedemann syndrome, Wilms tumor, rhabdomyosarcoma, adrenocortical carcinoma, and lung, ovarian, and breast cancer. This gene has been shown to be imprinted, with preferential expression from the maternal allele in placenta and liver. pleckstrin homology like domain family A member 2 7262 NA
COL5A1 ENSG00000130635 This gene encodes an alpha chain for one of the low abundance fibrillar collagens. Fibrillar collagen molecules are trimers that can be composed of one or more types of alpha chains. Type V collagen is found in tissues containing type I collagen and appears to regulate the assembly of heterotypic fibers composed of both type I and type V collagen. This gene product is closely related to type XI collagen and it is possible that the collagen chains of types V and XI constitute a single collagen type with tissue-specific chain combinations. The encoded procollagen protein occurs commonly as the heterotrimer pro-alpha1(V)-pro-alpha1(V)-pro-alpha2(V). Mutations in this gene are associated with Ehlers-Danlos syndrome, types I and II. Alternative splicing of this gene results in multiple transcript variants. collagen type V alpha 1 1289 NA
TOX2 ENSG00000124191 NA TOX high mobility group box family member 2 84969 NA
MMP23B ENSG00000189409 This gene (MMP23B) encodes a member of the matrix metalloproteinase (MMP) family, and it is part of a duplicated region of chromosome 1p36.3. Proteins of the matrix metalloproteinase (MMP) family are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and metastasis. This gene belongs to the more telomeric copy of the duplicated region. matrix metallopeptidase 23B 8510 NA
PPP1R1A ENSG00000135447 NA protein phosphatase 1 regulatory inhibitor subunit 1A 5502 NA
PRL ENSG00000172179 This gene encodes the anterior pituitary hormone prolactin. This secreted hormone is a growth regulator for many tissues, including cells of the immune system. It may also play a role in cell survival by suppressing apoptosis, and it is essential for lactation. Alternative splicing results in multiple transcript variants that encode the same protein. prolactin 5617 NA
COL15A1 ENSG00000204291 This gene encodes the alpha chain of type XV collagen, a member of the FACIT collagen family (fibril-associated collagens with interrupted helices). Type XV collagen has a wide tissue distribution but the strongest expression is localized to basement membrane zones so it may function to adhere basement membranes to underlying connective tissue stroma. The proteolytically produced C-terminal fragment of type XV collagen is restin, a potentially antiangiogenic protein that is closely related to endostatin. Mouse studies have shown that collagen XV deficiency is associated with muscle and microvessel deterioration. collagen type XV alpha 1 chain 1306 NA
SNORA73B ENSG00000200087 NA small nucleolar RNA, H/ACA box 73B ENSG00000200087 NA
IL12A ENSG00000168811 This gene encodes a subunit of a cytokine that acts on T and natural killer cells, and has a broad array of biological activities. The cytokine is a disulfide-linked heterodimer composed of the 35-kD subunit encoded by this gene, and a 40-kD subunit that is a member of the cytokine receptor family. This cytokine is required for the T-cell-independent induction of interferon (IFN)-gamma, and is important for the differentiation of both Th1 and Th2 cells. The responses of lymphocytes to this cytokine are mediated by the activator of transcription protein STAT4. Nitric oxide synthase 2A (NOS2A/NOS2) is found to be required for the signaling process of this cytokine in innate immunity. interleukin 12A 3592 NA
ACKR3 ENSG00000144476 This gene encodes a member of the G-protein coupled receptor family. Although this protein was earlier thought to be a receptor for vasoactive intestinal peptide (VIP), it is now considered to be an orphan receptor, in that its endogenous ligand has not been identified. The protein is also a coreceptor for human immunodeficiency viruses (HIV). Translocations involving this gene and HMGA2 on chromosome 12 have been observed in lipomas. atypical chemokine receptor 3 57007 NA
RARRES2 ENSG00000106538 This gene encodes a secreted chemotactic protein that initiates chemotaxis via the ChemR23 G protein-coupled seven-transmembrane domain ligand. Expression of this gene is upregulated by the synthetic retinoid tazarotene and occurs in a wide variety of tissues. The active protein has several roles, including that as an adipokine and as an antimicrobial protein with activity against bacteria and fungi. retinoic acid receptor responder 2 5919 NA
COL5A2 ENSG00000204262 This gene encodes an alpha chain for one of the low abundance fibrillar collagens. Fibrillar collagen molecules are trimers that can be composed of one or more types of alpha chains. Type V collagen is found in tissues containing type I collagen and appears to regulate the assembly of heterotypic fibers composed of both type I and type V collagen. This gene product is closely related to type XI collagen and it is possible that the collagen chains of types V and XI constitute a single collagen type with tissue-specific chain combinations. Mutations in this gene are associated with Ehlers-Danlos syndrome, types I and II. collagen type V alpha 2 chain 1290 NA
CSRP2 ENSG00000175183 CSRP2 is a member of the CSRP family of genes, encoding a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. CRP2 contains two copies of the cysteine-rich amino acid sequence motif (LIM) with putative zinc-binding activity, and may be involved in regulating ordered cell growth. Other genes in the family include CSRP1 and CSRP3. Alternative splicing results in multiple transcript variants. cysteine and glycine rich protein 2 1466 NA
CNN3 ENSG00000117519 This gene encodes a protein with a markedly acidic C terminus; the basic N-terminus is highly homologous to the N-terminus of a related gene, CNN1. Members of the CNN gene family all contain similar tandemly repeated motifs. This encoded protein is associated with the cytoskeleton but is not involved in contraction. calponin 3 1266 NA
COL12A1 ENSG00000111799 This gene encodes the alpha chain of type XII collagen, a member of the FACIT (fibril-associated collagens with interrupted triple helices) collagen family. Type XII collagen is a homotrimer found in association with type I collagen, an association that is thought to modify the interactions between collagen I fibrils and the surrounding matrix. Alternatively spliced transcript variants encoding different isoforms have been identified. collagen type XII alpha 1 chain 1303 NA
PDZRN3 ENSG00000121440 This gene encodes a member of the LNX (Ligand of Numb Protein-X) family of RING-type ubiquitin E3 ligases. This protein may function in vascular morphogenesis and the differentiation of adipocytes, osteoblasts and myoblasts. This protein may be targeted for degradation by the human papilloma virus E6 protein. Alternative splicing results in multiple transcript variants. PDZ domain containing ring finger 3 23024 NA
OLAH ENSG00000152463 NA oleoyl-ACP hydrolase 55301 NA
NRARP ENSG00000198435 NA NOTCH-regulated ankyrin repeat protein 441478 NA
ECE2 ENSG00000145194 This gene encodes a member of the M13 family, which includes type 2 integral membrane metallopeptidases. The encoded enzyme is a membrane-bound zinc-dependent metalloprotease. The enzyme catalyzes the cleavage of big endothelin to produce the vasoconstrictor endothelin-1, and plays a role in the processing of several neuroendocrine peptides. It may also have methyltransferase activity. Alternative splicing results in multiple transcript variants. endothelin converting enzyme 2 9718 NA
DLL1 ENSG00000198719 DLL1 is a human homolog of the Notch Delta ligand and is a member of the delta/serrate/jagged family. It plays a role in mediating cell fate decisions during hematopoiesis. It may play a role in cell-to-cell communication. delta like canonical Notch ligand 1 28514 NA
PTGS2 ENSG00000073756 Prostaglandin-endoperoxide synthase (PTGS), also known as cyclooxygenase, is the key enzyme in prostaglandin biosynthesis, and acts both as a dioxygenase and as a peroxidase. There are two isozymes of PTGS: a constitutive PTGS1 and an inducible PTGS2, which differ in their regulation of expression and tissue distribution. This gene encodes the inducible isozyme. It is regulated by specific stimulatory events, suggesting that it is responsible for the prostanoid biosynthesis involved in inflammation and mitogenesis. prostaglandin-endoperoxide synthase 2 5743 NA
NKD2 ENSG00000145506 This gene encodes a member of a family of proteins that function as negative regulators of Wnt receptor signaling through interaction with Dishevelled family members. The encoded protein participates in the delivery of transforming growth factor alpha-containing vesicles to the cell membrane. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. naked cuticle homolog 2 85409 NA
P2RY12 ENSG00000169313 The product of this gene belongs to the family of G-protein coupled receptors. This family has several receptor subtypes with different pharmacological selectivity, which overlaps in some cases, for various adenosine and uridine nucleotides. This receptor is involved in platelet aggregation, and is a potential target for the treatment of thromboembolisms and other clotting disorders. Mutations in this gene are implicated in bleeding disorder, platelet type 8 (BDPLT8). Alternative splicing results in multiple transcript variants of this gene. purinergic receptor P2Y12 64805 NA
WWTR1-AS1 ENSG00000241313 NA WWTR1 antisense RNA 1 100128025 NA
CTD-2184D3.5 ENSG00000259712 NA NA ENSG00000259712 NA
ATP2A1-AS1 ENSG00000260442 NA ATP2A1 antisense RNA 1 100289092 NA
FBLN1 ENSG00000077942 Fibulin 1 is a secreted glycoprotein that becomes incorporated into a fibrillar extracellular matrix. Calcium-binding is apparently required to mediate its binding to laminin and nidogen. It mediates platelet adhesion via binding fibrinogen. Four splice variants which differ in the 3’ end have been identified. Each variant encodes a different isoform, but no functional distinctions have been identified among the four variants. fibulin 1 2192 NA
PTGIR ENSG00000160013 The protein encoded by this gene is a member of the G-protein coupled receptor family 1 and has been shown to be a receptor for prostacyclin. Prostacyclin, the major product of cyclooxygenase in macrovascular endothelium, elicits a potent vasodilation and inhibition of platelet aggregation through binding to this receptor. prostaglandin I2 (prostacyclin) receptor (IP) 5739 NA
VCAM1 ENSG00000162692 This gene is a member of the Ig superfamily and encodes a cell surface sialoglycoprotein expressed by cytokine-activated endothelium. This type I membrane protein mediates leukocyte-endothelial cell adhesion and signal transduction, and may play a role in the development of artherosclerosis and rheumatoid arthritis. Three alternatively spliced transcripts encoding different isoforms have been described for this gene. vascular cell adhesion molecule 1 7412 NA
DDIT4L ENSG00000145358 NA DNA damage inducible transcript 4 like 115265 NA
C8orf4 ENSG00000176907 This gene encodes a small, monomeric, predominantly unstructured protein that functions as a positive regulator of the Wnt/beta-catenin signaling pathway. This protein interacts with a repressor of beta-catenin mediated transcription at nuclear speckles. It is thought to competitively block interactions of the repressor with beta-catenin, resulting in up-regulation of beta-catenin target genes. The encoded protein may also play a role in the NF-kappaB and ERK1/2 signaling pathways. Expression of this gene may play a role in the proliferation of several types of cancer including thyroid cancer, breast cancer and hematological malignancies. chromosome 8 open reading frame 4 56892 NA
IER3 ENSG00000137331 This gene functions in the protection of cells from Fas- or tumor necrosis factor type alpha-induced apoptosis. Partially degraded and unspliced transcripts are found after virus infection in vitro, but these transcripts are not found in vivo and do not generate a valid protein. immediate early response 3 8870 NA
CCDC102B ENSG00000150636 NA coiled-coil domain containing 102B 79839 NA
SERPINA1 ENSG00000197249 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. serpin family A member 1 5265 NA
TMEM266 ENSG00000169758 NA transmembrane protein 266 123591 NA
FILIP1L ENSG00000168386 NA filamin A interacting protein 1 like 11259 NA
HES4 ENSG00000188290 NA hes family bHLH transcription factor 4 57801 NA
FBN1 ENSG00000166147 This gene encodes a member of the fibrillin family of proteins. The encoded preproprotein is proteolytically processed to generate two proteins including the extracellular matrix component fibrillin-1 and the protein hormone asprosin. Fibrillin-1 is an extracellular matrix glycoprotein that serves as a structural component of calcium-binding microfibrils. These microfibrils provide force-bearing structural support in elastic and nonelastic connective tissue throughout the body. Asprosin, secreted by white adipose tissue, has been shown to regulate glucose homeostasis. Mutations in this gene are associated with Marfan syndrome and the related MASS phenotype, as well as ectopia lentis syndrome, Weill-Marchesani syndrome, Shprintzen-Goldberg syndrome and neonatal progeroid syndrome. fibrillin 1 2200 NA
SHB ENSG00000107338 NA SH2 domain containing adaptor protein B 6461 NA
PPFIBP1 ENSG00000110841 The protein encoded by this gene is a member of the LAR protein-tyrosine phosphatase-interacting protein (liprin) family. Liprins interact with members of LAR family of transmembrane protein tyrosine phosphatases, which are known to be important for axon guidance and mammary gland development. It has been proposed that liprins are multivalent proteins that form complex structures and act as scaffolds for the recruitment and anchoring of LAR family of tyrosine phosphatases. This protein was found to interact with S100A4, a calcium-binding protein related to tumor invasiveness and metastasis. In vitro experiment demonstrated that the interaction inhibited the phosphorylation of this protein by protein kinase C and protein kinase CK2. Alternatively spliced transcript variants encoding distinct isoforms have been reported. PPFIA binding protein 1 8496 NA
VASN ENSG00000168140 NA vasorin 114990 NA
LUM ENSG00000139329 This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family that includes decorin, biglycan, fibromodulin, keratocan, epiphycan, and osteoglycin. In these bifunctional molecules, the protein moiety binds collagen fibrils and the highly charged hydrophilic glycosaminoglycans regulate interfibrillar spacings. Lumican is the major keratan sulfate proteoglycan of the cornea but is also distributed in interstitial collagenous matrices throughout the body. Lumican may regulate collagen fibril organization and circumferential growth, corneal transparency, and epithelial cell migration and tissue repair. lumican 4060 NA
CTD-3128G10.6 ENSG00000269680 NA NA ENSG00000269680 NA
COL1A2 ENSG00000164692 This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. collagen type I alpha 2 chain 1278 NA
NNAT ENSG00000053438 The protein encoded by this gene is a proteolipid that may be involved in the regulation of ion channels during brain development. The encoded protein may also play a role in forming and maintaining the structure of the nervous system. This gene is found within an intron of another gene, bladder cancer associated protein, but on the opposite strand. This gene is imprinted and is expressed only from the paternal allele. neuronatin 4826 NA
MAPK12 ENSG00000188130 Activation of members of the mitogen-activated protein kinase family is a major mechanism for transduction of extracellular signals. Stress-activated protein kinases are one subclass of MAP kinases. The protein encoded by this gene functions as a signal transducer during differentiation of myoblasts to myotubes. mitogen-activated protein kinase 12 6300 NA
ADAMTS7 ENSG00000136378 The protein encoded by this gene is a member of the ADAMTS (a disintegrin and metalloproteinase with thrombospondin motifs) family. Members of this family share several distinct protein modules, including a propeptide region, a metalloproteinase domain, a disintegrin-like domain, and a thrombospondin type 1 (TS) motif. Individual members of this family differ in the number of C-terminal TS motifs, and some have unique C-terminal domains. The encoded preproprotein is proteolytically processed to generate the mature enzyme. This enzyme contains two C-terminal TS motifs and may regulate vascular smooth muscle cell (VSMC) migration. Mutations in this gene may be associated with susceptibility to coronary artery disease. ADAM metallopeptidase with thrombospondin type 1 motif 7 11173 NA
HES1 ENSG00000114315 This protein belongs to the basic helix-loop-helix family of transcription factors. It is a transcriptional repressor of genes that require a bHLH protein for their transcription. The protein has a particular type of basic domain that contains a helix interrupting protein that binds to the N-box rather than the canonical E-box. hes family bHLH transcription factor 1 3280 NA
RP11-359P5.1 ENSG00000249996 NA NA ENSG00000249996 NA
MMP2 ENSG00000087245 This gene is a member of the matrix metalloproteinase (MMP) gene family, that are zinc-dependent enzymes capable of cleaving components of the extracellular matrix and molecules involved in signal transduction. The protein encoded by this gene is a gelatinase A, type IV collagenase, that contains three fibronectin type II repeats in its catalytic site that allow binding of denatured type IV and V collagen and elastin. Unlike most MMP family members, activation of this protein can occur on the cell membrane. This enzyme can be activated extracellularly by proteases, or, intracellulary by its S-glutathiolation with no requirement for proteolytical removal of the pro-domain. This protein is thought to be involved in multiple pathways including roles in the nervous system, endometrial menstrual breakdown, regulation of vascularization, and metastasis. Mutations in this gene have been associated with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. matrix metallopeptidase 2 4313 NA
RP11-54O7.14 ENSG00000242590 NA NA ENSG00000242590 NA
GPNMB ENSG00000136235 The protein encoded by this gene is a type I transmembrane glycoprotein which shows homology to the pMEL17 precursor, a melanocyte-specific protein. GPNMB shows expression in the lowly metastatic human melanoma cell lines and xenografts but does not show expression in the highly metastatic cell lines. GPNMB may be involved in growth delay and reduction of metastatic potential. Two transcript variants encoding different isoforms have been found for this gene. glycoprotein nmb 10457 NA
AGRN ENSG00000188157 This gene encodes one of several proteins that are critical in the development of the neuromuscular junction (NMJ), as identified in mouse knock-out studies. The encoded protein contains several laminin G, Kazal type serine protease inhibitor, and epidermal growth factor domains. Additional post-translational modifications occur to add glycosaminoglycans and disulfide bonds. In one family with congenital myasthenic syndrome affecting limb-girdle muscles, a mutation in this gene was found. Alternative splicing results in multiple transcript variants encoding different isoforms. agrin 375790 NA
GNG12 ENSG00000172380 NA G protein subunit gamma 12 55970 NA
COL3A1 ENSG00000168542 This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. collagen type III alpha 1 chain 1281 NA
SEMA3F ENSG00000001617 This gene encodes a member of the semaphorin III family of secreted signaling proteins that are involved in axon guidance during neuronal development. The encoded protein contains an N-terminal Sema domain, an immunoglobulin loop and a C-terminal basic domain. This gene is expressed by the endothelial cells where it was found to act in an autocrine fashion to induce apoptosis, inhibit cell proliferation and survival, and function as an anti-tumorigenic agent. Alternative splicing results in multiple transcript variants encoding different isoforms. semaphorin 3F 6405 NA
WWTR1 ENSG00000018408 NA WW domain containing transcription regulator 1 25937 NA
PDLIM1 ENSG00000107438 This gene encodes a member of the enigma protein family. The protein contains two protein interacting domains, a PDZ domain at the amino terminal end and one to three LIM domains at the carboxyl terminal. It is a cytoplasmic protein associated with the cytoskeleton. The protein may function as an adapter to bring other LIM-interacting proteins to the cytoskeleton. Pseudogenes associated with this gene are located on chromosomes 3, 14 and 17. PDZ and LIM domain 1 9124 NA
KCNJ12 ENSG00000184185 This gene encodes an inwardly rectifying K+ channel which may be blocked by divalent cations. This protein is thought to be one of multiple inwardly rectifying channels which contribute to the cardiac inward rectifier current (IK1). The gene is located within the Smith-Magenis syndrome region on chromosome 17. potassium voltage-gated channel subfamily J member 12 3768 NA
NA ENSG00000255905 NA NA NA TRUE
UGDH ENSG00000109814 The protein encoded by this gene converts UDP-glucose to UDP-glucuronate and thereby participates in the biosynthesis of glycosaminoglycans such as hyaluronan, chondroitin sulfate, and heparan sulfate. These glycosylated compounds are common components of the extracellular matrix and likely play roles in signal transduction, cell migration, and cancer growth and metastasis. The expression of this gene is up-regulated by transforming growth factor beta and down-regulated by hypoxia. Alternative splicing results in multiple transcript variants. UDP-glucose 6-dehydrogenase 7358 NA
SMOC2 ENSG00000112562 This gene encodes a member of the SPARC family (secreted protein acidic and rich in cysteine/osteonectin/BM-40), which are highly expressed during embryogenesis and wound healing. The gene product is a matricellular protein which promotes matrix assembly and can stimulate endothelial cell proliferation and migration, as well as angiogenic activity. Associated with pulmonary function, this secretory gene product contains a Kazal domain, two thymoglobulin type-1 domains, and two EF-hand calcium-binding domains. The encoded protein may serve as a target for controlling angiogenesis in tumor growth and myocardial ischemia. Alternative splicing results in multiple transcript variants. SPARC related modular calcium binding 2 64094 NA
RBP1 ENSG00000114115 This gene encodes the carrier protein involved in the transport of retinol (vitamin A alcohol) from the liver storage site to peripheral tissue. Vitamin A is a fat-soluble vitamin necessary for growth, reproduction, differentiation of epithelial tissues, and vision. Multiple transcript variants encoding different isoforms have been found for this gene. retinol binding protein 1 5947 NA
RP5-906A24.2 ENSG00000266101 NA NA ENSG00000266101 NA
NEURL1 ENSG00000107954 NA neuralized E3 ubiquitin protein ligase 1 9148 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_load_voom/gene_names_clus_",20,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Transpose the matrix to ensure sparse factors

##  Voom counts transpose

#Expected complete Log likelihood at iteration 100: -8.18694e+07
#Marginal log likelihood at iteration 100: inf
#Residual variance at iteration 100: 1.72648
#Residual sum of squares at iteration 100: 6.53386e+07

##  Sqrt counts transpose

# Expected complete Log likelihood at iteration 100: -3.9495e+08
# Marginal log likelihood at iteration 100: -inf
# Residual variance at iteration 100: 638.648
# Residual sum of squares at iteration 100: 2.35575e+10

## counts transpose

GTEx 2013 Factor analysis (sparse factors: sqrt counts)

lambda_out <- read.table("../sfa_outputs/GTEX2013_transpose/sqrt_counts_gtex/gtex_sqrt_counts_transpose_lambda.out");
f_out <- read.table("../sfa_outputs/GTEX2013_transpose/sqrt_counts_gtex/gtex_sqrt_counts_transpose_F.out");

gene_names <- as.vector(as.matrix(read.table("../sfa_inputs/gene_names_GTEX_V6.txt")));
gene_names <- substring(gene_names,1,15);
xli  <-  gene_names;

indices_mat <- SFA.ExtractTopFeatures(lambda_out, top_features = 100, options="min", mult.annotate = TRUE)

gene_list <- do.call(rbind, lapply(1:dim(indices_mat)[1], function(x) gene_names[indices_mat[x,]]))

Factor 1 Annotations

out <- mygene::queryMany(gene_list[1,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name query symbol summary X_id notfound
myosin, heavy chain 11, smooth muscle ENSG00000133392 MYH11 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. 4629 NA
decorin ENSG00000011465 DCN This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. 1634 NA
actin, beta ENSG00000075624 ACTB This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. 60 NA
collagen type VI alpha 3 chain ENSG00000163359 COL6A3 This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. 1293 NA
secreted protein acidic and cysteine rich ENSG00000113140 SPARC This gene encodes a cysteine-rich acidic matrix-associated protein. The encoded protein is required for the collagen in bone to become calcified but is also involved in extracellular matrix synthesis and promotion of changes to cell shape. The gene product has been associated with tumor suppression but has also been correlated with metastasis based on changes to cell shape which can promote tumor cell invasion. Three transcript variants encoding different isoforms have been found for this gene. 6678 NA
lymphocyte cytosolic protein 1 ENSG00000136167 LCP1 Plastins are a family of actin-binding proteins that are conserved throughout eukaryote evolution and expressed in most tissues of higher eukaryotes. In humans, two ubiquitous plastin isoforms (L and T) have been identified. Plastin 1 (otherwise known as Fimbrin) is a third distinct plastin isoform which is specifically expressed at high levels in the small intestine. The L isoform is expressed only in hemopoietic cell lineages, while the T isoform has been found in all other normal cells of solid tissues that have replicative potential (fibroblasts, endothelial cells, epithelial cells, melanocytes, etc.). However, L-plastin has been found in many types of malignant human cells of non-hemopoietic origin suggesting that its expression is induced accompanying tumorigenesis in solid tissues. 3936 NA
lysosomal protein transmembrane 5 ENSG00000162511 LAPTM5 This gene encodes a transmembrane receptor that is associated with lysosomes. The encoded protein, also known as E3 protein, may play a role in hematopoiesis. 7805 NA
cysteine and glycine rich protein 1 ENSG00000159176 CSRP1 This gene encodes a member of the cysteine-rich protein (CSRP) family. This gene family includes a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this gene product occurs in proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Alternatively spliced transcript variants have been described. 1465 NA
pentraxin 3 ENSG00000163661 PTX3 NA 5806 NA
apolipoprotein D ENSG00000189058 APOD This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. 347 NA
beta-2-microglobulin ENSG00000166710 B2M This gene encodes a serum protein found in association with the major histocompatibility complex (MHC) class I heavy chain on the surface of nearly all nucleated cells. The protein has a predominantly beta-pleated sheet structure that can form amyloid fibrils in some pathological conditions. The encoded antimicrobial protein displays antibacterial activity in amniotic fluid. A mutation in this gene has been shown to result in hypercatabolic hypoproteinemia. 567 NA
actin, gamma 2, smooth muscle, enteric ENSG00000163017 ACTG2 Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. 72 NA
CD53 molecule ENSG00000143119 CD53 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. This encoded protein is a cell surface glycoprotein that is known to complex with integrins. It contributes to the transduction of CD2-generated signals in T cells and natural killer cells and has been suggested to play a role in growth regulation. Familial deficiency of this gene has been linked to an immunodeficiency associated with recurrent infectious diseases caused by bacteria, fungi and viruses. Alternative splicing results in multiple transcript variants. 963 NA
collagen type I alpha 2 chain ENSG00000164692 COL1A2 This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1278 NA
transgelin ENSG00000149591 TAGLN The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. 6876 NA
CD248 molecule ENSG00000174807 CD248 NA 57124 NA
sortilin-related receptor, L(DLR class) A repeats containing ENSG00000137642 SORL1 This gene encodes a mosaic protein that belongs to at least two families: the vacuolar protein sorting 10 (VPS10) domain-containing receptor family, and the low density lipoprotein receptor (LDLR) family. The encoded protein also contains fibronectin type III repeats and an epidermal growth factor repeat. The encoded preproprotein is proteolytically processed to generate the mature receptor, which likely plays roles in endocytosis and sorting. Mutations in this gene may be associated with Alzheimer’s disease. 6653 NA
collagen type XII alpha 1 chain ENSG00000111799 COL12A1 This gene encodes the alpha chain of type XII collagen, a member of the FACIT (fibril-associated collagens with interrupted triple helices) collagen family. Type XII collagen is a homotrimer found in association with type I collagen, an association that is thought to modify the interactions between collagen I fibrils and the surrounding matrix. Alternatively spliced transcript variants encoding different isoforms have been identified. 1303 NA
crystallin alpha B ENSG00000109846 CRYAB Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. 1410 NA
actin, alpha 1, skeletal muscle ENSG00000143632 ACTA1 The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. 58 NA
leiomodin 1 ENSG00000163431 LMOD1 The leiomodin 1 protein has a putative membrane-spanning region and 2 types of tandemly repeated blocks. The transcript is expressed in all tissues tested, with the highest levels in thyroid, eye muscle, skeletal muscle, and ovary. Increased expression of leiomodin 1 may be linked to Graves’ disease and thyroid-associated ophthalmopathy. 25802 NA
coronin 1A ENSG00000102879 CORO1A This gene encodes a member of the WD repeat protein family. WD repeats are minimally conserved regions of approximately 40 amino acids typically bracketed by gly-his and trp-asp (GH-WD), which may facilitate formation of heterotrimeric or multiprotein complexes. Members of this family are involved in a variety of cellular processes, including cell cycle progression, signal transduction, apoptosis, and gene regulation. Alternative splicing results in multiple transcript variants. A related pseudogene has been defined on chromosome 16. 11151 NA
NDRG family member 4 ENSG00000103034 NDRG4 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that is required for cell cycle progression and survival in primary astrocytes and may be involved in the regulation of mitogenic signalling in vascular smooth muscles cells. Alternative splicing results in multiple transcripts encoding different isoforms. 65009 NA
protein tyrosine phosphatase, non-receptor type 6 ENSG00000111679 PTPN6 The protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family. PTPs are known to be signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation. N-terminal part of this PTP contains two tandem Src homolog (SH2) domains, which act as protein phospho-tyrosine binding domains, and mediate the interaction of this PTP with its substrates. This PTP is expressed primarily in hematopoietic cells, and functions as an important regulator of multiple signaling pathways in hematopoietic cells. This PTP has been shown to interact with, and dephosphorylate a wide spectrum of phospho-proteins involved in hematopoietic cell signaling. Multiple alternatively spliced variants of this gene, which encode distinct isoforms, have been reported. 5777 NA
major histocompatibility complex, class I, C ENSG00000204525 HLA-C HLA-C belongs to the HLA class I heavy chain paralogues. This class I molecule is a heterodimer consisting of a heavy chain and a light chain (beta-2 microglobulin). The heavy chain is anchored in the membrane. Class I molecules play a central role in the immune system by presenting peptides derived from endoplasmic reticulum lumen. They are expressed in nearly all cells. The heavy chain is approximately 45 kDa and its gene contains 8 exons. Exon one encodes the leader peptide, exons 2 and 3 encode the alpha1 and alpha2 domain, which both bind the peptide, exon 4 encodes the alpha3 domain, exon 5 encodes the transmembrane region, and exons 6 and 7 encode the cytoplasmic tail. Polymorphisms within exon 2 and exon 3 are responsible for the peptide binding specificity of each class one molecule. Typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. Over one hundred HLA-C alleles have been described 3107 NA
serglycin ENSG00000122862 SRGN This gene encodes a protein best known as a hematopoietic cell granule proteoglycan. Proteoglycans stored in the secretory granules of many hematopoietic cells also contain a protease-resistant peptide core, which may be important for neutralizing hydrolytic enzymes. This encoded protein was found to be associated with the macromolecular complex of granzymes and perforin, which may serve as a mediator of granule-mediated apoptosis. Two transcript variants, only one of them protein-coding, have been found for this gene. 5552 NA
epithelial membrane protein 1 ENSG00000134531 EMP1 NA 2012 NA
matrix metallopeptidase 2 ENSG00000087245 MMP2 This gene is a member of the matrix metalloproteinase (MMP) gene family, that are zinc-dependent enzymes capable of cleaving components of the extracellular matrix and molecules involved in signal transduction. The protein encoded by this gene is a gelatinase A, type IV collagenase, that contains three fibronectin type II repeats in its catalytic site that allow binding of denatured type IV and V collagen and elastin. Unlike most MMP family members, activation of this protein can occur on the cell membrane. This enzyme can be activated extracellularly by proteases, or, intracellulary by its S-glutathiolation with no requirement for proteolytical removal of the pro-domain. This protein is thought to be involved in multiple pathways including roles in the nervous system, endometrial menstrual breakdown, regulation of vascularization, and metastasis. Mutations in this gene have been associated with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. 4313 NA
LDL receptor related protein 1 ENSG00000123384 LRP1 This gene encodes a member of the low-density lipoprotein receptor family of proteins. The encoded preproprotein is proteolytically processed by furin to generate 515 kDa and 85 kDa subunits that form the mature receptor (PMID: 8546712). This receptor is involved in several cellular processes, including intracellular signaling, lipid homeostasis, and clearance of apoptotic cells. In addition, the encoded protein is necessary for the alpha 2-macroglobulin-mediated clearance of secreted amyloid precursor protein and beta-amyloid, the main component of amyloid plaques found in Alzheimer patients. Expression of this gene decreases with age and has been found to be lower than controls in brain tissue from Alzheimer’s disease patients. 4035 NA
adrenomedullin ENSG00000148926 ADM The protein encoded by this gene is a preprohormone which is cleaved to form two biologically active peptides, adrenomedullin and proadrenomedullin N-terminal 20 peptide. Adrenomedullin is a 52 aa peptide with several functions, including vasodilation, regulation of hormone secretion, promotion of angiogenesis, and antimicrobial activity. The antimicrobial activity is antibacterial, as the peptide has been shown to kill E. coli and S. aureus at low concentration. 133 NA
calponin 1 ENSG00000130176 CNN1 NA 1264 NA
actin binding LIM protein 1 ENSG00000099204 ABLIM1 This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. 3983 NA
fibronectin 1 ENSG00000115414 FN1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. 2335 NA
lipopolysaccharide induced TNF factor ENSG00000189067 LITAF Lipopolysaccharide is a potent stimulator of monocytes and macrophages, causing secretion of tumor necrosis factor-alpha (TNF-alpha) and other inflammatory mediators. This gene encodes lipopolysaccharide-induced TNF-alpha factor, which is a DNA-binding protein and can mediate the TNF-alpha expression by direct binding to the promoter region of the TNF-alpha gene. The transcription of this gene is induced by tumor suppressor p53 and has been implicated in the p53-induced apoptotic pathway. Mutations in this gene cause Charcot-Marie-Tooth disease type 1C (CMT1C) and may be involved in the carcinogenesis of extramammary Paget’s disease (EMPD). Multiple alternatively spliced transcript variants have been found for this gene. 9516 NA
complement factor D ENSG00000197766 CFD This gene encodes a member of the S1, or chymotrypsin, family of serine peptidases. This protease catalyzes the cleavage of factor B, the rate-limiting step of the alternative pathway of complement activation. This protein also functions as an adipokine, a cell signaling protein secreted by adipocytes, which regulates insulin secretion in mice. Mutations in this gene underlie complement factor D deficiency, which is associated with recurrent bacterial meningitis infections in human patients. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate the mature protease. 1675 NA
sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 1 ENSG00000152377 SPOCK1 This gene encodes the protein core of a seminal plasma proteoglycan containing chondroitin- and heparan-sulfate chains. The protein’s function is unknown, although similarity to thyropin-type cysteine protease-inhibitors suggests its function may be related to protease inhibition. 6695 NA
murine retrovirus integration site 1 homolog ENSG00000072952 MRVI1 This gene is similar to a putative mouse tumor suppressor gene (Mrvi1) that is frequently disrupted by mouse AIDS-related virus (MRV). The encoded protein, which is found in the membrane of the endoplasmic reticulum, is similar to Jaw1, a lymphoid-restricted protein whose expression is down-regulated during lymphoid differentiation. This protein is a substrate of cGMP-dependent kinase-1 (PKG1) that can function as a regulator of IP3-induced calcium release. Studies in mouse suggest that MRV integration at Mrvi1 induces myeloid leukemia by altering the expression of a gene important for myeloid cell growth and/or differentiation, and thus this gene may function as a myeloid leukemia tumor suppressor gene. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, and alternative translation start sites, including a non-AUG (CUG) start site, are used. 10335 NA
LIM domain containing preferred translocation partner in lipoma ENSG00000145012 LPP This gene encodes a member of a subfamily of LIM domain proteins that are characterized by an N-terminal proline-rich region and three C-terminal LIM domains. The encoded protein localizes to the cell periphery in focal adhesions and may be involved in cell-cell adhesion and cell motility. This protein also shuttles through the nucleus and may function as a transcriptional co-activator. This gene is located at the junction of certain disease-related chromosomal translocations, which result in the expression of chimeric proteins that may promote tumor growth. Alternative splicing results in multiple transcript variants. 4026 NA
coiled-coil domain containing 80 ENSG00000091986 CCDC80 NA 151887 NA
hematopoietic cell-specific Lyn substrate 1 ENSG00000180353 HCLS1 NA 3059 NA
polymerase I and transcript release factor ENSG00000177469 PTRF This gene encodes a protein that enables the dissociation of paused ternary polymerase I transcription complexes from the 3’ end of pre-rRNA transcripts. This protein regulates rRNA transcription by promoting the dissociation of transcription complexes and the reinitiation of polymerase I on nascent rRNA transcripts. This protein also localizes to caveolae at the plasma membrane and is thought to play a critical role in the formation of caveolae and the stabilization of caveolins. This protein translocates from caveolae to the cytoplasm after insulin stimulation. Caveolae contain truncated forms of this protein and may be the site of phosphorylation-dependent proteolysis. This protein is also thought to modify lipid metabolism and insulin-regulated gene expression. Mutations in this gene result in a disorder characterized by generalized lipodystrophy and muscular dystrophy. 284119 NA
lumican ENSG00000139329 LUM This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family that includes decorin, biglycan, fibromodulin, keratocan, epiphycan, and osteoglycin. In these bifunctional molecules, the protein moiety binds collagen fibrils and the highly charged hydrophilic glycosaminoglycans regulate interfibrillar spacings. Lumican is the major keratan sulfate proteoglycan of the cornea but is also distributed in interstitial collagenous matrices throughout the body. Lumican may regulate collagen fibril organization and circumferential growth, corneal transparency, and epithelial cell migration and tissue repair. 4060 NA
pleckstrin and Sec7 domain containing 4 ENSG00000125637 PSD4 NA 23550 NA
FGR proto-oncogene, Src family tyrosine kinase ENSG00000000938 FGR This gene is a member of the Src family of protein tyrosine kinases (PTKs). The encoded protein contains N-terminal sites for myristylation and palmitylation, a PTK domain, and SH2 and SH3 domains which are involved in mediating protein-protein interactions with phosphotyrosine-containing and proline-rich motifs, respectively. The protein localizes to plasma membrane ruffles, and functions as a negative regulator of cell migration and adhesion triggered by the beta-2 integrin signal transduction pathway. Infection with Epstein-Barr virus results in the overexpression of this gene. Multiple alternatively spliced variants, encoding the same protein, have been identified. 2268 NA
gelsolin ENSG00000148180 GSN The protein encoded by this gene binds to the ‘plus’ ends of actin monomers and filaments to prevent monomer exchange. The encoded calcium-regulated protein functions in both assembly and disassembly of actin filaments. Defects in this gene are a cause of familial amyloidosis Finnish type (FAF). Multiple transcript variants encoding several different isoforms have been found for this gene. 2934 NA
myosin light chain 9 ENSG00000101335 MYL9 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. 10398 NA
NA ENSG00000263335 AF001548.5 NA ENSG00000263335 NA
Ras association domain family member 2 ENSG00000101265 RASSF2 This gene encodes a protein that contains a Ras association domain. Similar to its cattle and sheep counterparts, this gene is located near the prion gene. Two alternatively spliced transcripts encoding the same isoform have been reported. 9770 NA
cathepsin K ENSG00000143387 CTSK The protein encoded by this gene is a lysosomal cysteine proteinase involved in bone remodeling and resorption. This protein, which is a member of the peptidase C1 protein family, is predominantly expressed in osteoclasts. However, the encoded protein is also expressed in a significant fraction of human breast cancers, where it could contribute to tumor invasiveness. Mutations in this gene are the cause of pycnodysostosis, an autosomal recessive disease characterized by osteosclerosis and short stature. 1513 NA
laminin subunit beta 1 ENSG00000091136 LAMB1 Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. Several isoforms of each chain have been described. Different alpha, beta and gamma chain isomers combine to give rise to different heterotrimeric laminin isoforms which are designated by Arabic numerals in the order of their discovery, i.e. alpha1beta1gamma1 heterotrimer is laminin 1. The biological functions of the different chains and trimer molecules are largely unknown, but some of the chains have been shown to differ with respect to their tissue distribution, presumably reflecting diverse functions in vivo. This gene encodes the beta chain isoform laminin, beta 1. The beta 1 chain has 7 structurally distinct domains which it shares with other beta chain isomers. The C-terminal helical region containing domains I and II are separated by domain alpha, domains III and V contain several EGF-like repeats, and domains IV and VI have a globular conformation. Laminin, beta 1 is expressed in most tissues that produce basement membranes, and is one of the 3 chains constituting laminin 1, the first laminin isolated from Engelbreth-Holm-Swarm (EHS) tumor. A sequence in the beta 1 chain that is involved in cell attachment, chemotaxis, and binding to the laminin receptor was identified and shown to have the capacity to inhibit metastasis. 3912 NA
ras-related C3 botulinum toxin substrate 2 (rho family, small GTP binding protein Rac2) ENSG00000128340 RAC2 This gene encodes a member of the Ras superfamily of small guanosine triphosphate (GTP)-metabolizing proteins. The encoded protein localizes to the plasma membrane, where it regulates diverse processes, such as secretion, phagocytosis, and cell polarization. Activity of this protein is also involved in the generation of reactive oxygen species. Mutations in this gene are associated with neutrophil immunodeficiency syndrome. There is a pseudogene for this gene on chromosome 6. 5880 NA
integrin subunit beta 2 ENSG00000160255 ITGB2 This gene encodes an integrin beta chain, which combines with multiple different alpha chains to form different integrin heterodimers. Integrins are integral cell-surface proteins that participate in cell adhesion as well as cell-surface mediated signalling. The encoded protein plays an important role in immune response and defects in this gene cause leukocyte adhesion deficiency. Alternative splicing results in multiple transcript variants. 3689 NA
lamin A/C ENSG00000160789 LMNA The nuclear lamina consists of a two-dimensional matrix of proteins located next to the inner nuclear membrane. The lamin family of proteins make up the matrix and are highly conserved in evolution. During mitosis, the lamina matrix is reversibly disassembled as the lamin proteins are phosphorylated. Lamin proteins are thought to be involved in nuclear stability, chromatin structure and gene expression. Vertebrate lamins consist of two types, A and B. Alternative splicing results in multiple transcript variants. Mutations in this gene lead to several diseases: Emery-Dreifuss muscular dystrophy, familial partial lipodystrophy, limb girdle muscular dystrophy, dilated cardiomyopathy, Charcot-Marie-Tooth disease, and Hutchinson-Gilford progeria syndrome. 4000 NA
neuropilin 1 ENSG00000099250 NRP1 This gene encodes one of two neuropilins, which contain specific protein domains which allow them to participate in several different types of signaling pathways that control cell migration. Neuropilins contain a large N-terminal extracellular domain, made up of complement-binding, coagulation factor V/VIII, and meprin domains. These proteins also contains a short membrane-spanning domain and a small cytoplasmic domain. Neuropilins bind many ligands and various types of co-receptors; they affect cell survival, migration, and attraction. Some of the ligands and co-receptors bound by neuropilins are vascular endothelial growth factor (VEGF) and semaphorin family members. Several alternatively spliced transcript variants that encode different protein isoforms have been described for this gene. 8829 NA
NA ENSG00000259716 NA NA NA TRUE
laminin subunit gamma 1 ENSG00000135862 LAMC1 Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins, composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively), have a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. Several isoforms of each chain have been described. Different alpha, beta and gamma chain isomers combine to give rise to different heterotrimeric laminin isoforms which are designated by Arabic numerals in the order of their discovery, i.e. alpha1beta1gamma1 heterotrimer is laminin 1. The biological functions of the different chains and trimer molecules are largely unknown, but some of the chains have been shown to differ with respect to their tissue distribution, presumably reflecting diverse functions in vivo. This gene encodes the gamma chain isoform laminin, gamma 1. The gamma 1 chain, formerly thought to be a beta chain, contains structural domains similar to beta chains, however, lacks the short alpha region separating domains I and II. The structural organization of this gene also suggested that it had diverged considerably from the beta chain genes. Embryos of transgenic mice in which both alleles of the gamma 1 chain gene were inactivated by homologous recombination, lacked basement membranes, indicating that laminin, gamma 1 chain is necessary for laminin heterotrimer assembly. It has been inferred by analogy with the strikingly similar 3’ UTR sequence in mouse laminin gamma 1 cDNA, that multiple polyadenylation sites are utilized in human to generate the 2 different sized mRNAs (5.5 and 7.5 kb) seen on Northern analysis. 3915 NA
insulin like growth factor binding protein 3 ENSG00000146674 IGFBP3 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein forms a ternary complex with insulin-like growth factor acid-labile subunit (IGFALS) and either insulin-like growth factor (IGF) I or II. In this form, it circulates in the plasma, prolonging the half-life of IGFs and altering their interaction with cell surface receptors. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. 3486 NA
actin, alpha 2, smooth muscle, aorta ENSG00000107796 ACTA2 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. 59 NA
testin LIM domain protein ENSG00000135269 TES Cancer-associated chromosomal changes often involve regions containing fragile sites. This gene maps to a commom fragile site on chromosome 7q31.2 designated FRA7G. This gene is similar to mouse Testin, a testosterone-responsive gene encoding a Sertoli cell secretory protein containing three LIM domains. LIM domains are double zinc-finger motifs that mediate protein-protein interactions between transcription factors, cytoskeletal proteins and signaling proteins. This protein is a negative regulator of cell growth and may act as a tumor suppressor. This scaffold protein may also play a role in cell adhesion, cell spreading and in the reorganization of the actin cytoskeleton. Multiple protein isoforms are encoded by transcript variants of this gene. 26136 NA
pleckstrin ENSG00000115956 PLEK NA 5341 NA
linker for activation of T-cells family member 2 ENSG00000086730 LAT2 This gene is one of the contiguous genes at 7q11.23 commonly deleted in Williams syndrome, a multisystem developmental disorder. This gene consists of at least 14 exons, and its alternative splicing generates 3 transcript variants, all encoding the same protein. 7462 NA
troponin C1, slow skeletal and cardiac type ENSG00000114854 TNNC1 Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z. 7134 NA
LYN proto-oncogene, Src family tyrosine kinase ENSG00000254087 LYN This gene encodes a tyrosine protein kinase, which maybe involved in the regulation of mast cell degranulation, and erythroid differentiation. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 4067 NA
regulator of G-protein signaling 2 ENSG00000116741 RGS2 Regulator of G protein signaling (RGS) family members are regulatory molecules that act as GTPase activating proteins (GAPs) for G alpha subunits of heterotrimeric G proteins. RGS proteins are able to deactivate G protein subunits of the Gi alpha, Go alpha and Gq alpha subtypes. They drive G proteins into their inactive GDP-bound forms. Regulator of G protein signaling 2 belongs to this family. The protein acts as a mediator of myeloid differentiation and may play a role in leukemogenesis. 5997 NA
ADAM metallopeptidase domain 8 ENSG00000151651 ADAM8 This gene encodes a member of the ADAM (a disintegrin and metalloprotease domain) family. Members of this family are membrane-anchored proteins structurally related to snake venom disintegrins, and have been implicated in a variety of biological processes involving cell-cell and cell-matrix interactions, including fertilization, muscle development, and neurogenesis. The protein encoded by this gene may be involved in cell adhesion during neurodegeneration, and it is thought to be a target for allergic respiratory diseases, including asthma. Alternative splicing results in multiple transcript variants. 101 NA
platelet derived growth factor receptor beta ENSG00000113721 PDGFRB This gene encodes a cell surface tyrosine kinase receptor for members of the platelet-derived growth factor family. These growth factors are mitogens for cells of mesenchymal origin. The identity of the growth factor bound to a receptor monomer determines whether the functional receptor is a homodimer or a heterodimer, composed of both platelet-derived growth factor receptor alpha and beta polypeptides. This gene is flanked on chromosome 5 by the genes for granulocyte-macrophage colony-stimulating factor and macrophage-colony stimulating factor receptor; all three genes may be implicated in the 5-q syndrome. A translocation between chromosomes 5 and 12, that fuses this gene to that of the translocation, ETV6, leukemia gene, results in chronic myeloproliferative disorder with eosinophilia. 5159 NA
synaptopodin 2 ENSG00000172403 SYNPO2 NA 171024 NA
phosphodiesterase 4D interacting protein ENSG00000178104 PDE4DIP The protein encoded by this gene serves to anchor phosphodiesterase 4D to the Golgi/centrosome region of the cell. Defects in this gene may be a cause of myeloproliferative disorder (MBD) associated with eosinophilia. Several transcript variants encoding different isoforms have been found for this gene. 9659 NA
four and a half LIM domains 2 ENSG00000115641 FHL2 This gene encodes a member of the four-and-a-half-LIM-only protein family. Family members contain two highly conserved, tandemly arranged, zinc finger domains with four highly conserved cysteines binding a zinc atom in each zinc finger. This protein is thought to have a role in the assembly of extracellular membranes. Also, this gene is down-regulated during transformation of normal myoblasts to rhabdomyosarcoma cells and the encoded protein may function as a link between presenilin-2 and an intracellular signaling pathway. Multiple alternatively spliced variants encoding different isoforms have been identified. 2274 NA
CAP, adenylate cyclase-associated protein 1 (yeast) ENSG00000131236 CAP1 The protein encoded by this gene is related to the S. cerevisiae CAP protein, which is involved in the cyclic AMP pathway. The human protein is able to interact with other molecules of the same protein, as well as with CAP2 and actin. Alternatively spliced transcript variants have been identified. 10487 NA
galectin 1 ENSG00000100097 LGALS1 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. This gene product may act as an autocrine negative growth factor that regulates cell proliferation. 3956 NA
pleckstrin homology and RhoGEF domain containing G5 ENSG00000171680 PLEKHG5 This gene encodes a protein that activates the nuclear factor kappa B (NFKB1) signaling pathway. Mutations in this gene are associated with autosomal recessive distal spinal muscular atrophy. Multiple transcript variants encoding different isoforms have been found for this gene. 57449 NA
LIM domain and actin binding 1 ENSG00000050405 LIMA1 This gene encodes a cytoskeleton-associated protein that inhibits actin filament depolymerization and cross-links filaments in bundles. It is downregulated in some cancer cell lines. Alternatively spliced transcript variants encoding different isoforms have been described for this gene, and expression of some of the variants maybe independently regulated. 51474 NA
docking protein 3 ENSG00000146094 DOK3 NA 79930 NA
Rho GTPase activating protein 45 ENSG00000180448 ARHGAP45 NA 23526 NA
lipoprotein lipase ENSG00000175445 LPL LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. 4023 NA
natriuretic peptide A ENSG00000175206 NPPA The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. 4878 NA
TBC1 domain family member 1 ENSG00000065882 TBC1D1 TBC1D1 is the founding member of a family of proteins sharing a 180- to 200-amino acid TBC domain presumed to have a role in regulating cell growth and differentiation. These proteins share significant homology with TRE2 (USP6; MIM 604334), yeast Bub2, and CDC16 (MIM 603461) (White et al., 2000 [PubMed 10965142]). 23216 NA
synemin ENSG00000182253 SYNM The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene. 23336 NA
S100 calcium binding protein A10 ENSG00000197747 S100A10 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in exocytosis and endocytosis. 6281 NA
pleckstrin and Sec7 domain containing 3 ENSG00000156011 PSD3 NA 23362 NA
G-protein signaling modulator 3 ENSG00000213654 GPSM3 NA 63940 NA
EGF containing fibulin like extracellular matrix protein 1 ENSG00000115380 EFEMP1 This gene encodes a member of the fibulin family of extracellular matrix glycoproteins. Like all members of this family, the encoded protein contains tandemly repeated epidermal growth factor-like repeats followed by a C-terminus fibulin-type domain. This gene is upregulated in malignant gliomas and may play a role in the aggressive nature of these tumors. Mutations in this gene are associated with Doyne honeycomb retinal dystrophy. Alternatively spliced transcript variants that encode the same protein have been described. 2202 NA
Rho GDP dissociation inhibitor beta ENSG00000111348 ARHGDIB Members of the Rho (or ARH) protein family (see MIM 165390) and other Ras-related small GTP-binding proteins (see MIM 179520) are involved in diverse cellular events, including cell signaling, proliferation, cytoskeletal organization, and secretion. The GTP-binding proteins are active only in the GTP-bound state. At least 3 classes of proteins tightly regulate cycling between the GTP-bound and GDP-bound states: GTPase-activating proteins (GAPs), guanine nucleotide-releasing factors (GRFs), and GDP-dissociation inhibitors (GDIs). The GDIs, including ARHGDIB, decrease the rate of GDP dissociation from Ras-like GTPases (summary by Scherle et al., 1993 [PubMed 8356058]). 397 NA
phosphatidylinositol-3,4,5-trisphosphate dependent Rac exchange factor 1 ENSG00000124126 PREX1 The protein encoded by this gene acts as a guanine nucleotide exchange factor for the RHO family of small GTP-binding proteins (RACs). It has been shown to bind to and activate RAC1 by exchanging bound GDP for free GTP. The encoded protein, which is found mainly in the cytoplasm, is activated by phosphatidylinositol-3,4,5-trisphosphate and the beta-gamma subunits of heterotrimeric G proteins. 57580 NA
olfactomedin like 3 ENSG00000116774 OLFML3 NA 56944 NA
neuritin 1 ENSG00000124785 NRN1 This gene encodes a member of the neuritin family, and is expressed in postmitotic-differentiating neurons of the developmental nervous system and neuronal structures associated with plasticity in the adult. The expression of this gene can be induced by neural activity and neurotrophins. The encoded protein contains a consensus cleavage signal found in glycosylphoshatidylinositol (GPI)-anchored proteins. The encoded protein promotes neurite outgrowth and arborization, suggesting its role in promoting neuritogenesis. Overexpression of the encoded protein may be associated with astrocytoma progression. Alternative splicing results in multiple transcript variants. 51299 NA
thymocyte selection associated family member 2 ENSG00000130775 THEMIS2 NA 9473 NA
myosin light chain kinase ENSG00000065534 MYLK This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. 4638 NA
NA ENSG00000263065 AF001548.6 NA ENSG00000263065 NA
ArfGAP with coiled-coil, ankyrin repeat and PH domains 1 ENSG00000072818 ACAP1 NA 9744 NA
thrombospondin 1 ENSG00000137801 THBS1 The protein encoded by this gene is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein can bind to fibrinogen, fibronectin, laminin, type V collagen and integrins alpha-V/beta-1. This protein has been shown to play roles in platelet aggregation, angiogenesis, and tumorigenesis. 7057 NA
Rho GTPase activating protein 30 ENSG00000186517 ARHGAP30 NA 257106 NA
ACTA2 antisense RNA 1 ENSG00000180139 ACTA2-AS1 NA ENSG00000180139 NA
tenascin XB ENSG00000168477 TNXB This gene encodes a member of the tenascin family of extracellular matrix glycoproteins. The tenascins have anti-adhesive effects, as opposed to fibronectin which is adhesive. This protein is thought to function in matrix maturation during wound healing, and its deficiency has been associated with the connective tissue disorder Ehlers-Danlos syndrome. This gene localizes to the major histocompatibility complex (MHC) class III region on chromosome 6. It is one of four genes in this cluster which have been duplicated. The duplicated copy of this gene is incomplete and is a pseudogene which is transcribed but does not encode a protein. The structure of this gene is unusual in that it overlaps the CREBL1 and CYP21A2 genes at its 5’ and 3’ ends, respectively. Multiple transcript variants encoding different isoforms have been found for this gene. 7148 NA
serpin family F member 1 ENSG00000132386 SERPINF1 The protein encoded by this gene is a member of the serpin family, although it does not display the serine protease inhibitory activity shown by many of the other serpin family members. The encoded protein is secreted and strongly inhibits angiogenesis. In addition, this protein is a neurotrophic factor involved in neuronal differentiation in retinoblastoma cells. 5176 NA
Jun proto-oncogene, AP-1 transcription factor subunit ENSG00000177606 JUN This gene is the putative transforming gene of avian sarcoma virus 17. It encodes a protein which is highly similar to the viral protein, and which interacts directly with specific target DNA sequences to regulate gene expression. This gene is intronless and is mapped to 1p32-p31, a chromosomal region involved in both translocations and deletions in human malignancies. 3725 NA
MOB kinase activator 3A ENSG00000172081 MOB3A NA 126308 NA
regulator of G-protein signaling 14 ENSG00000169220 RGS14 This gene encodes a member of the regulator of G-protein signaling family. This protein contains one RGS domain, two Raf-like Ras-binding domains (RBDs), and one GoLoco domain. The protein attenuates the signaling activity of G-proteins by binding, through its GoLoco domain, to specific types of activated, GTP-bound G alpha subunits. Acting as a GTPase activating protein (GAP), the protein increases the rate of conversion of the GTP to GDP. This hydrolysis allows the G alpha subunits to bind G beta/gamma subunit heterodimers, forming inactive G-protein heterotrimers, thereby terminating the signal. Alternate transcriptional splice variants of this gene have been observed but have not been thoroughly characterized. 10636 NA
cysteine rich transmembrane BMP regulator 1 (chordin-like) ENSG00000150938 CRIM1 This gene encodes a transmembrane protein containing six cysteine-rich repeat domains and an insulin-like growth factor-binding domain. The encoded protein may play a role in tissue development though interactions with members of the transforming growth factor beta family, such as bone morphogenetic proteins. 51232 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",1,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 2 Annotations

out <- mygene::queryMany(gene_list[2,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
summary X_id query symbol name
This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. 2335 ENSG00000115414 FN1 fibronectin 1
This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. 3858 ENSG00000186395 KRT10 keratin 10
This gene encodes the alpha chain of type XVIII collagen. This collagen is one of the multiplexins, extracellular matrix proteins that contain multiple triple-helix domains (collagenous domains) interrupted by non-collagenous domains. A long isoform of the protein has an N-terminal domain that is homologous to the extracellular part of frizzled receptors. Proteolytic processing at several endogenous cleavage sites in the C-terminal domain results in production of endostatin, a potent antiangiogenic protein that is able to inhibit angiogenesis and tumor growth. Mutations in this gene are associated with Knobloch syndrome. The main features of this syndrome involve retinal abnormalities, so type XVIII collagen may play an important role in retinal structure and in neural tube closure. Alternative splicing results in multiple transcript variants. 80781 ENSG00000182871 COL18A1 collagen type XVIII alpha 1 chain
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3848 ENSG00000167768 KRT1 keratin 1
This gene encodes apolipoprotein A-I, which is the major protein component of high density lipoprotein (HDL) in plasma. The encoded preproprotein is proteolytically processed to generate the mature protein, which promotes cholesterol efflux from tissues to the liver for excretion, and is a cofactor for lecithin cholesterolacyltransferase (LCAT), an enzyme responsible for the formation of most plasma cholesteryl esters. This gene is closely linked with two other apolipoprotein genes on chromosome 11. Defects in this gene are associated with HDL deficiencies, including Tangier disease, and with systemic non-neuropathic amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein. 335 ENSG00000118137 APOA1 apolipoprotein A1
The protein encoded by this gene is a small secreted cysteine-rich protein and a member of the CCN family of regulatory proteins. CNN family proteins associate with the extracellular matrix and play an important role in cardiovascular and skeletal development, fibrosis and cancer development. 4856 ENSG00000136999 NOV nephroblastoma overexpressed
This gene encodes the heavy chain subunit of the pre-alpha-trypsin inhibitor complex. This complex may stabilize the extracellular matrix through its ability to bind hyaluronic acid. Polymorphisms of this gene may be associated with increased risk for schizophrenia and major depressive disorder. This gene is present in an inter-alpha-trypsin inhibitor family gene cluster on chromosome 3. 3699 ENSG00000162267 ITIH3 inter-alpha-trypsin inhibitor heavy chain 3
The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. 4155 ENSG00000197971 MBP myelin basic protein
The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. 4256 ENSG00000111341 MGP matrix Gla protein
The protein encoded by this gene, pre-angiotensinogen or angiotensinogen precursor, is expressed in the liver and is cleaved by the enzyme renin in response to lowered blood pressure. The resulting product, angiotensin I, is then cleaved by angiotensin converting enzyme (ACE) to generate the physiologically active enzyme angiotensin II. The protein is involved in maintaining blood pressure and in the pathogenesis of essential hypertension and preeclampsia. Mutations in this gene are associated with susceptibility to essential hypertension, and can cause renal tubular dysgenesis, a severe disorder of renal tubular development. Defects in this gene have also been associated with non-familial structural atrial fibrillation, and inflammatory bowel disease. 183 ENSG00000135744 AGT angiotensinogen
Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. 213 ENSG00000163631 ALB albumin
The protein encoded by this gene is a metalloprotein that binds most of the copper in plasma and is involved in the peroxidation of Fe(II)transferrin to Fe(III) transferrin. Mutations in this gene cause aceruloplasminemia, which results in iron accumulation and tissue damage, and is associated with diabetes and neurologic abnormalities. Two transcript variants, one protein-coding and the other not protein-coding, have been found for this gene. 1356 ENSG00000047457 CP ceruloplasmin (ferroxidase)
ARHGEF10L is a member of the RhoGEF family of guanine nucleotide exchange factors (GEFs) that activate Rho GTPases (Winkler et al., 2005 [PubMed 16112081]). 55160 ENSG00000074964 ARHGEF10L Rho guanine nucleotide exchange factor 10 like
The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. 6876 ENSG00000149591 TAGLN transgelin
The protein encoded by this gene is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein can bind to fibrinogen, fibronectin, laminin, type V collagen and integrins alpha-V/beta-1. This protein has been shown to play roles in platelet aggregation, angiogenesis, and tumorigenesis. 7057 ENSG00000137801 THBS1 thrombospondin 1
The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. 59 ENSG00000107796 ACTA2 actin, alpha 2, smooth muscle, aorta
This gene encodes the anterior pituitary hormone prolactin. This secreted hormone is a growth regulator for many tissues, including cells of the immune system. It may also play a role in cell survival by suppressing apoptosis, and it is essential for lactation. Alternative splicing results in multiple transcript variants that encode the same protein. 5617 ENSG00000172179 PRL prolactin
This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. 7431 ENSG00000026025 VIM vimentin
The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. 4629 ENSG00000133392 MYH11 myosin, heavy chain 11, smooth muscle
The protein encoded by this gene is a secreted chaperone that can under some stress conditions also be found in the cell cytosol. It has been suggested to be involved in several basic biological events such as cell death, tumor progression, and neurodegenerative disorders. Alternate splicing results in both coding and non-coding variants. 1191 ENSG00000120885 CLU clusterin
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3849 ENSG00000172867 KRT2 keratin 2
Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). 70 ENSG00000159251 ACTC1 actin, alpha, cardiac muscle 1
This gene encodes the alpha subunit of the coagulation factor fibrinogen, which is a component of the blood clot. Following vascular injury, the encoded preproprotein is proteolytically processed by thrombin during the conversion of fibrinogen to fibrin. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia, afibrinogenemia and renal amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. 2243 ENSG00000171560 FGA fibrinogen alpha chain
Apolipoprotein C-III is a very low density lipoprotein (VLDL) protein. APOC3 inhibits lipoprotein lipase and hepatic lipase; it is thought to delay catabolism of triglyceride-rich particles. The APOA1, APOC3 and APOA4 genes are closely linked in both rat and human genomes. The A-I and A-IV genes are transcribed from the same strand, while the A-1 and C-III genes are convergently transcribed. An increase in apoC-III levels induces the development of hypertriglyceridemia. 345 ENSG00000110245 APOC3 apolipoprotein C3
The mitochondrial enzyme encoded by this gene catalyzes synthesis of carbamoyl phosphate from ammonia and bicarbonate. This reaction is the first committed step of the urea cycle, which is important in the removal of excess urea from cells. The encoded protein may also represent a core mitochondrial nucleoid protein. Three transcript variants encoding different isoforms have been found for this gene. The shortest isoform may not be localized to the mitochondrion. Mutations in this gene have been associated with carbamoyl phosphate synthetase deficiency, susceptibility to persistent pulmonary hypertension, and susceptibility to venoocclusive disease after bone marrow transplantation. 1373 ENSG00000021826 CPS1 carbamoyl-phosphate synthase 1
The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. 5730 ENSG00000107317 PTGDS prostaglandin D2 synthase
The protein encoded by this gene is the beta component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including afibrinogenemia, dysfibrinogenemia, hypodysfibrinogenemia and thrombotic tendency. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 2244 ENSG00000171564 FGB fibrinogen beta chain
This gene encodes a regulatory subunit of protein phosphatase-1 (PP1). PP1 catalyzes reversible protein phosphorylation, which is important in a wide range of cellular activities: neuronal, muscular, RNA splicing, protein synthesis, cell death, and glycogen metabolism, to name just a few. By interacting with different regulatory subunits, PP1 is directed to different parts of the cell, to different substrates, or to respond to extracellular signals. 5507 ENSG00000119938 PPP1R3C protein phosphatase 1 regulatory subunit 3C
The protein encoded by this gene is an enzyme in the catabolic pathway of tyrosine. The encoded protein catalyzes the conversion of 4-hydroxyphenylpyruvate to homogentisate. Defects in this gene are a cause of tyrosinemia type 3 (TYRO3) and hawkinsinuria (HAWK). Two transcript variants encoding different isoforms have been found for this gene. 3242 ENSG00000158104 HPD 4-hydroxyphenylpyruvate dioxygenase
The protein encoded by this gene is one of six similar proteins that bind insulin-like growth factors I and II (IGF-I and IGF-II). The encoded protein can be secreted into the bloodstream, where it binds IGF-I and IGF-II with high affinity, or it can remain intracellular, interacting with many different ligands. High expression levels of this protein promote the growth of several types of tumors and may be predictive of the chances of recovery of the patient. Several transcript variants, one encoding a secreted isoform and the others encoding nonsecreted isoforms, have been found for this gene. 3485 ENSG00000115457 IGFBP2 insulin like growth factor binding protein 2
The protein encoded by this gene is a transmembrane (type I) heparan sulfate proteoglycan and is a member of the syndecan proteoglycan family. The syndecans mediate cell binding, cell signaling, and cytoskeletal organization and syndecan receptors are required for internalization of the HIV-1 tat protein. The syndecan-1 protein functions as an integral membrane protein and participates in cell proliferation, cell migration and cell-matrix interactions via its receptor for extracellular matrix proteins. Altered syndecan-1 expression has been detected in several different tumor types. While several transcript variants may exist for this gene, the full-length natures of only two have been described to date. These two represent the major variants of this gene and encode the same protein. 6382 ENSG00000115884 SDC1 syndecan 1
This gene is upregulated in inflammatory diseases, and it was first observed as expressed in the differentiated layers of skin. The most interesting aspect of this gene is the differential use of promoters and terminators to generate isoforms with unique cellular distributions and domain components. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. 93099 ENSG00000161249 DMKN dermokine
The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. 7139 ENSG00000118194 TNNT2 troponin T2, cardiac type
This gene encodes a member of the CHD family of proteins which are characterized by the presence of chromo (chromatin organization modifier) domains and SNF2-related helicase/ATPase domains. This protein is one of the components of a histone deacetylase complex referred to as the Mi-2/NuRD complex which participates in the remodeling of chromatin by deacetylating histones. Chromatin remodeling is essential for many processes including transcription. Autoantibodies against this protein are found in a subset of patients with dermatomyositis. Three alternatively spliced transcripts encoding different isoforms have been described. 1107 ENSG00000170004 CHD3 chromodomain helicase DNA binding protein 3
Arg and c-Abl represent the mammalian members of the Abelson family of non-receptor protein-tyrosine kinases. They interact with the Arg/Abl binding proteins via the SH3 domains present in the carboxy end of the latter group of proteins. This gene encodes the sorbin and SH3 domain containing 2 protein. It has three C-terminal SH3 domains and an N-terminal sorbin homology (SoHo) domain that interacts with lipid raft proteins. The subcellular localization of this protein in epithelial and cardiac muscle cells suggests that it functions as an adapter protein to assemble signaling complexes in stress fibers, and that it is a potential link between Abl family kinases and the actin cytoskeleton. Alternative splicing results in multiple transcript variants encoding different isoforms. 8470 ENSG00000154556 SORBS2 sorbin and SH3 domain containing 2
This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. 1634 ENSG00000011465 DCN decorin
Members of the CELF/BRUNOL protein family contain two N-terminal RNA recognition motif (RRM) domains, one C-terminal RRM domain, and a divergent segment of 160-230 aa between the second and third RRM domains. Members of this protein family regulate pre-mRNA alternative splicing and may also be involved in mRNA editing, and translation. Alternative splicing results in multiple transcript variants encoding different isoforms. 10659 ENSG00000048740 CELF2 CUGBP, Elav-like family member 2
This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. 1674 ENSG00000175084 DES desmin
This gene encodes a key acute phase plasma protein. Because of its increase due to acute inflammation, this protein is classified as an acute-phase reactant. The specific function of this protein has not yet been determined; however, it may be involved in aspects of immunosuppression. 5004 ENSG00000229314 ORM1 orosomucoid 1
This gene encodes a mitochondrially localized enzyme that catalyzes the reversible formation of acetoacetyl-CoA from two molecules of acetyl-CoA. Defects in this gene are associated with 3-ketothiolase deficiency, an inborn error of isoleucine catabolism characterized by urinary excretion of 2-methyl-3-hydroxybutyric acid, 2-methylacetoacetic acid, tiglylglycine, and butanone. 38 ENSG00000075239 ACAT1 acetyl-CoA acetyltransferase 1
This gene encodes a preproprotein, which is processed to yield both alpha and beta chains, which subsequently combine as a tetramer to produce haptoglobin. Haptoglobin functions to bind free plasma hemoglobin, which allows degradative enzymes to gain access to the hemoglobin, while at the same time preventing loss of iron through the kidneys and protecting the kidneys from damage by hemoglobin. Mutations in this gene and/or its regulatory regions cause ahaptoglobinemia or hypohaptoglobinemia. This gene has also been linked to diabetic nephropathy, the incidence of coronary artery disease in type 1 diabetes, Crohn’s disease, inflammatory disease behavior, primary sclerosing cholangitis, susceptibility to idiopathic Parkinson’s disease, and a reduced incidence of Plasmodium falciparum malaria. The protein encoded also exhibits antimicrobial activity against bacteria. A similar duplicated gene is located next to this gene on chromosome 16. Multiple transcript variants encoding different isoforms have been found for this gene. 3240 ENSG00000257017 HP haptoglobin
This gene encodes a member of the transducer of erbB-2 /B-cell translocation gene protein family. Members of this family are anti-proliferative factors that have the potential to regulate cell growth. The encoded protein may function as a tumor suppressor. Alternate splicing results in multiple transcript variants. 10140 ENSG00000141232 TOB1 transducer of ERBB2, 1
This gene is a member of the Regulator of Complement Activation (RCA) gene cluster and encodes a protein with twenty short consensus repeat (SCR) domains. This protein is secreted into the bloodstream and has an essential role in the regulation of complement activation, restricting this innate defense mechanism to microbial infections. Mutations in this gene have been associated with hemolytic-uremic syndrome (HUS) and chronic hypocomplementemic nephropathy. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. 3075 ENSG00000000971 CFH complement factor H
This gene encodes a leucine-rich cytoplasmic protein, which is highly similar to a mouse protein that negatively regulates Ca/calmodulin-dependent protein kinase II phosphorylation and may be essential for spatial learning processes. Several alternatively spliced transcript variants of this gene have been described. 23154 ENSG00000020129 NCDN neurochondrin
This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1277 ENSG00000108821 COL1A1 collagen type I alpha 1
This gene encodes the heavy subunit of ferritin, the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in ferritin proteins are associated with several neurodegenerative diseases. This gene has multiple pseudogenes. Several alternatively spliced transcript variants have been observed, but their biological validity has not been determined. 2495 ENSG00000167996 FTH1 ferritin heavy chain 1
This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. 5644 ENSG00000204983 PRSS1 protease, serine 1
The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. 2752 ENSG00000135821 GLUL glutamate-ammonia ligase
This gene encodes a member of the insulin-like growth factor (IGF)-binding protein (IGFBP) family. IGFBPs bind IGFs with high affinity, and regulate IGF availability in body fluids and tissues and modulate IGF binding to its receptors. This protein binds IGF-I and IGF-II with relatively low affinity, and belongs to a subfamily of low-affinity IGFBPs. It also stimulates prostacyclin production and cell adhesion. Alternatively spliced transcript variants encoding different isoforms have been described for this gene, and one variant has been associated with retinal arterial macroaneurysm (PMID:21835307). 3490 ENSG00000163453 IGFBP7 insulin like growth factor binding protein 7
This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants. 1832 ENSG00000096696 DSP desmoplakin
The protein encoded by this gene belongs to the Kank family of proteins, which contain multiple ankyrin repeat domains. This family member functions in cytoskeleton formation by regulating actin polymerization. This gene is a candidate tumor suppressor for renal cell carcinoma. Mutations in this gene cause cerebral palsy spastic quadriplegic type 2, a central nervous system development disorder. A t(5;9) translocation results in fusion of the platelet-derived growth factor receptor beta gene (PDGFRB) on chromosome 5 with this gene in a myeloproliferative neoplasm featuring severe thrombocythemia. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 20. 23189 ENSG00000107104 KANK1 KN motif and ankyrin repeat domains 1
The protein encoded by this gene is the gamma component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia and thrombophilia. Alternative splicing results in transcript variants encoding different isoforms. 2266 ENSG00000171557 FGG fibrinogen gamma chain
The protein encoded by this gene is a member of the kinesin family and functions as an anterograde motor protein that transports membranous organelles along axonal microtubules. Mutations at this locus have been associated with spastic paraplegia-30 and hereditary sensory neuropathy IIC. Alternatively spliced transcript variants encoding distinct isoforms have been described. 547 ENSG00000130294 KIF1A kinesin family member 1A
The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. 2194 ENSG00000169710 FASN fatty acid synthase
The protein encoded by this gene belongs to a family of bifunctional proteins that are involved in both the synthesis and degradation of fructose-2,6-bisphosphate, a regulatory molecule that controls glycolysis in eukaryotes. The encoded protein has a 6-phosphofructo-2-kinase activity that catalyzes the synthesis of fructose-2,6-bisphosphate (F2,6BP), and a fructose-2,6-biphosphatase activity that catalyzes the degradation of F2,6BP. This protein is required for cell cycle progression and prevention of apoptosis. It functions as a regulator of cyclin-dependent kinase 1, linking glucose metabolism to cell proliferation and survival in tumor cells. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene. 5209 ENSG00000170525 PFKFB3 6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 3
This gene encodes a nebulin like protein that is abundantly expressed in cardiac muscle. The encoded protein binds actin and interacts with thin filaments and Z-line associated proteins in striated muscle. This protein may be involved in cardiac myofibril assembly. A shorter isoform of this protein termed LIM nebulette is expressed in non-muscle cells and may function as a component of focal adhesion complexes. Alternate splicing results in multiple transcript variants. 10529 ENSG00000078114 NEBL nebulette
Fructose-1,6-bisphosphate aldolase (EC 4.1.2.13) is a tetrameric glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Vertebrates have 3 aldolase isozymes which are distinguished by their electrophoretic and catalytic properties. Differences indicate that aldolases A, B, and C are distinct proteins, the products of a family of related ‘housekeeping’ genes exhibiting developmentally regulated expression of the different isozymes. The developing embryo produces aldolase A, which is produced in even greater amounts in adult muscle where it can be as much as 5% of total cellular protein. In adult liver, kidney and intestine, aldolase A expression is repressed and aldolase B is produced. In brain and other nervous tissue, aldolase A and C are expressed about equally. There is a high degree of homology between aldolase A and C. Defects in ALDOB cause hereditary fructose intolerance. 229 ENSG00000136872 ALDOB aldolase, fructose-bisphosphate B
This gene encodes a preproprotein that undergoes extensive, tissue-specific, post-translational processing via cleavage by subtilisin-like enzymes known as prohormone convertases. There are eight potential cleavage sites within the preproprotein and, depending on tissue type and the available convertases, processing may yield as many as ten biologically active peptides involved in diverse cellular functions. The encoded protein is synthesized mainly in corticotroph cells of the anterior pituitary where four cleavage sites are used; adrenocorticotrophin, essential for normal steroidogenesis and the maintenance of normal adrenal weight, and lipotropin beta are the major end products. In other tissues, including the hypothalamus, placenta, and epithelium, all cleavage sites may be used, giving rise to peptides with roles in pain and energy homeostasis, melanocyte stimulation, and immune modulation. These include several distinct melanotropins, lipotropins, and endorphins that are contained within the adrenocorticotrophin and beta-lipotropin peptides. The antimicrobial melanotropin alpha peptide exhibits antibacterial and antifungal activity. Mutations in this gene have been associated with early onset obesity, adrenal insufficiency, and red hair pigmentation. Alternatively spliced transcript variants encoding the same protein have been described. 5443 ENSG00000115138 POMC proopiomelanocortin
The protein encoded by this gene is found as a pentamer and is a major substrate for the cAMP-dependent protein kinase in cardiac muscle. The encoded protein is an inhibitor of cardiac muscle sarcoplasmic reticulum Ca(2+)-ATPase in the unphosphorylated state, but inhibition is relieved upon phosphorylation of the protein. The subsequent activation of the Ca(2+) pump leads to enhanced muscle relaxation rates, thereby contributing to the inotropic response elicited in heart by beta-agonists. The encoded protein is a key regulator of cardiac diastolic function. Mutations in this gene are a cause of inherited human dilated cardiomyopathy with refractory congestive heart failure, and also familial hypertrophic cardiomyopathy. 5350 ENSG00000198523 PLN phospholamban
Tight junctions represent one mode of cell-to-cell adhesion in epithelial or endothelial cell sheets, forming continuous seals around cells and serving as a physical barrier to prevent solutes and water from passing freely through the paracellular space. These junctions are comprised of sets of continuous networking strands in the outwardly facing cytoplasmic leaflet, with complementary grooves in the inwardly facing extracytoplasmic leaflet. The protein encoded by this gene, a member of the claudin family, is an integral membrane protein and a component of tight junction strands. Loss of function mutations result in neonatal ichthyosis-sclerosing cholangitis syndrome. 9076 ENSG00000163347 CLDN1 claudin 1
The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins. 1 ENSG00000121410 A1BG alpha-1-B glycoprotein
The protein encoded by this gene is a member of the somatotropin/prolactin family of hormones which play an important role in growth control. The gene, along with four other related genes, is located at the growth hormone locus on chromosome 17 where they are interspersed in the same transcriptional orientation; an arrangement which is thought to have evolved by a series of gene duplications. The five genes share a remarkably high degree of sequence identity. Alternative splicing generates additional isoforms of each of the five growth hormones, leading to further diversity and potential for specialization. This particular family member is expressed in the pituitary but not in placental tissue as is the case for the other four genes in the growth hormone locus. Mutations in or deletions of the gene lead to growth hormone deficiency and short stature. 2688 ENSG00000259384 GH1 growth hormone 1
The protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family. PTPs are known to be signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation. This protein contains a C-terminal PTP domain and an N-terminal domain homologous to the band 4.1 superfamily of cytoskeletal-associated proteins. P97, a cell cycle regulator involved in a variety of membrane related functions, has been shown to be a substrate of this PTP. This PTP was also found to interact with, and be regulated by adaptor protein 14-3-3 beta. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene. 5774 ENSG00000070159 PTPN3 protein tyrosine phosphatase, non-receptor type 3
This gene encodes a member of the paralemmin protein family. The product of this gene is a prenylated and palmitoylated phosphoprotein that associates with the cytoplasmic face of plasma membranes and is implicated in plasma membrane dynamics in neurons and other cell types. Several alternatively spliced transcript variants have been identified, but the full-length nature of only two transcript variants has been determined. 5064 ENSG00000099864 PALM paralemmin
NA 81618 ENSG00000135916 ITM2C integral membrane protein 2C
This gene encodes a Plekstrin homology and SEC7 domains-containing protein that functions as a guanine nucleotide exchange factor. The encoded protein regulates signal transduction by activating ADP-ribosylation factor 6. Alternative splicing results in multiple transcript variants. 5662 ENSG00000059915 PSD pleckstrin and Sec7 domain containing
This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. 1357 ENSG00000091704 CPA1 carboxypeptidase A1
The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. 5265 ENSG00000197249 SERPINA1 serpin family A member 1
NA 3488 ENSG00000115461 IGFBP5 insulin like growth factor binding protein 5
This gene encodes a plasma glycoprotein that binds heme with high affinity. The encoded protein is an acute phase protein that transports heme from the plasma to the liver and may be involved in protecting cells from oxidative stress. 3263 ENSG00000110169 HPX hemopexin
This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. 8490 ENSG00000143248 RGS5 regulator of G-protein signaling 5
This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. 5406 ENSG00000175535 PNLIP pancreatic lipase
Carbonic anhydrases (CAs) are a large family of zinc metalloenzymes that catalyze the reversible hydration of carbon dioxide. They participate in a variety of biological processes, including respiration, calcification, acid-base balance, bone resorption, and the formation of aqueous humor, cerebrospinal fluid, saliva, and gastric acid. They show extensive diversity in tissue distribution and in their subcellular localization. CA XI is likely a secreted protein, however, radical changes at active site residues completely conserved in CA isozymes with catalytic activity, make it unlikely that it has carbonic anhydrase activity. It shares properties in common with two other acatalytic CA isoforms, CA VIII and CA X. CA XI is most abundantly expressed in brain, and may play a general role in the central nervous system. 770 ENSG00000063180 CA11 carbonic anhydrase 11
This gene encodes a member of the membrane-associated guanylate kinase (MAGUK) family. It heteromultimerizes with another MAGUK protein, DLG2, and is recruited into NMDA receptor and potassium channel clusters. These two MAGUK proteins may interact at postsynaptic sites to form a multimeric scaffold for the clustering of receptors, ion channels, and associated signaling proteins. Multiple transcript variants encoding different isoforms have been found for this gene. 1742 ENSG00000132535 DLG4 discs large MAGUK scaffold protein 4
This gene encodes a member of the phosphatidylethanolamine-binding family of proteins and has been shown to modulate multiple signaling pathways, including the MAP kinase (MAPK), NF-kappa B, and glycogen synthase kinase-3 (GSK-3) signaling pathways. The encoded protein can be further processed to form a smaller cleavage product, hippocampal cholinergic neurostimulating peptide (HCNP), which may be involved in neural development. This gene has been implicated in numerous human cancers and may act as a metastasis suppressor gene. Multiple pseudogenes of this gene have been identified in the genome. 5037 ENSG00000089220 PEBP1 phosphatidylethanolamine binding protein 1
NA 255743 ENSG00000168743 NPNT nephronectin
NA ENSG00000266844 ENSG00000266844 RP11-862L9.3 NA
This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. 2813 ENSG00000169347 GP2 glycoprotein 2
This gene encodes a member of the keratin family, the most diverse group of intermediate filaments. This gene product, a type I keratin, is usually found as a heterotetramer with two keratin 5 molecules, a type II keratin. Together they form the cytoskeleton of epithelial cells. Mutations in the genes for these keratins are associated with epidermolysis bullosa simplex. At least one pseudogene has been identified at 17p12-p11. 3861 ENSG00000186847 KRT14 keratin 14
NA ENSG00000268230 ENSG00000268230 CTD-2619J13.8 NA
This gene encodes a classical cadherin and member of the cadherin superfamily. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein is proteolytically processed to generate a calcium-dependent cell adhesion molecule and glycoprotein. This protein plays a role in the establishment of left-right asymmetry, development of the nervous system and the formation of cartilage and bone. 1000 ENSG00000170558 CDH2 cadherin 2
Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. 4624 ENSG00000197616 MYH6 myosin, heavy chain 6, cardiac muscle, alpha
The protein encoded by this gene is a member of the pexin family. It is found in serum and tissues and promotes cell adhesion and spreading, inhibits the membrane-damaging effect of the terminal cytolytic complement pathway, and binds to several serpin serine protease inhibitors. It is a secreted protein and exists in either a single chain form or a clipped, two chain form held together by a disulfide bond. 7448 ENSG00000109072 VTN vitronectin
This gene encodes a bifunctional signal transduction molecule. Dopaminergic and glutamatergic receptor stimulation regulates its phosphorylation and function as a kinase or phosphatase inhibitor. As a target for dopamine, this gene may serve as a therapeutic target for neurologic and psychiatric disorders. Multiple transcript variants encoding different isoforms have been found for this gene. 84152 ENSG00000131771 PPP1R1B protein phosphatase 1 regulatory inhibitor subunit 1B
The protein encoded by this gene is one of two large chain components of the assembly protein complex 2, which serves to link clathrin to receptors in coated vesicles. The encoded protein is found on the cytoplasmic face of coated vesicles in the plasma membrane. Two transcript variants encoding different isoforms have been found for this gene. 163 ENSG00000006125 AP2B1 adaptor related protein complex 2 beta 1 subunit
Arginase catalyzes the hydrolysis of arginine to ornithine and urea. At least two isoforms of mammalian arginase exist (types I and II) which differ in their tissue distribution, subcellular localization, immunologic crossreactivity and physiologic function. The type I isoform encoded by this gene, is a cytosolic enzyme and expressed predominantly in the liver as a component of the urea cycle. Inherited deficiency of this enzyme results in argininemia, an autosomal recessive disorder characterized by hyperammonemia. Two transcript variants encoding different isoforms have been found for this gene. 383 ENSG00000118520 ARG1 arginase 1
This gene encodes an enzyme involved in fatty acid biosynthesis, primarily the synthesis of oleic acid. The protein belongs to the fatty acid desaturase family and is an integral membrane protein located in the endoplasmic reticulum. Transcripts of approximately 3.9 and 5.2 kb, differing only by alternative polyadenlyation signals, have been detected. A gene encoding a similar enzyme is located on chromosome 4 and a pseudogene of this gene is located on chromosome 17. 6319 ENSG00000099194 SCD stearoyl-CoA desaturase
This gene encodes a thiamine-dependent enzyme which plays a role in the channeling of excess sugar phosphates to glycolysis in the pentose phosphate pathway. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 7086 ENSG00000163931 TKT transketolase
This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. 1292 ENSG00000142173 COL6A2 collagen type VI alpha 2
This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 7169 ENSG00000198467 TPM2 tropomyosin 2 (beta)
NA 4495 ENSG00000125144 MT1G metallothionein 1G
NA ENSG00000225670 ENSG00000225670 CADM3-AS1 CADM3 antisense RNA 1
This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. The encoded protein metabolizes drugs as well as the steroid hormones testosterone and progesterone. This gene is part of a cluster of cytochrome P450 genes on chromosome 7q21.1. Two pseudogenes of this gene have been identified within this cluster on chromosome 7. Expression of this gene is widely variable among populations, and a single nucleotide polymorphism that affects transcript splicing has been associated with susceptibility to hypertensions. Alternative splicing results in multiple transcript variants. 1577 ENSG00000106258 CYP3A5 cytochrome P450 family 3 subfamily A member 5
This gene encodes a protein that belongs to the microtubule-associated protein family. The proteins of this family are thought to be involved in microtubule assembly, which is an essential step in neurogenesis. The product of this gene is a precursor polypeptide that presumably undergoes proteolytic processing to generate the final MAP1A heavy chain and LC2 light chain. Expression of this gene is almost exclusively in the brain. Studies of the rat microtubule-associated protein 1A gene suggested a role in early events of spinal cord development. 4130 ENSG00000166963 MAP1A microtubule associated protein 1A
This gene encodes a member of the FXYD family of transmembrane proteins. This particular protein encodes phosphohippolin, which likely affects the activity of Na,K-ATPase. Multiple alternatively spliced transcript variants encoding the same protein have been described. Related pseudogenes have been identified on chromosomes 10 and X. Read-through transcripts have been observed between this locus and the downstream sodium/potassium-transporting ATPase subunit gamma (FXYD2, GeneID 486) locus. 53826 ENSG00000137726 FXYD6 FXYD domain containing ion transport regulator 6
Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. 10136 ENSG00000142789 CELA3A chymotrypsin like elastase family member 3A
This gene encodes an integral membrane protein that is a major component of myelin in the peripheral nervous system. Studies suggest two alternately used promoters drive tissue-specific expression. Various mutations of this gene are causes of Charcot-Marie-Tooth disease Type IA, Dejerine-Sottas syndrome, and hereditary neuropathy with liability to pressure palsies. Alternative splicing results in multiple transcript variants. 5376 ENSG00000109099 PMP22 peripheral myelin protein 22
The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. 3860 ENSG00000171401 KRT13 keratin 13
This gene encodes a member of the dynamin subfamily of GTP-binding proteins. The encoded protein possesses unique mechanochemical properties used to tubulate and sever membranes, and is involved in clathrin-mediated endocytosis and other vesicular trafficking processes. Actin and other cytoskeletal proteins act as binding partners for the encoded protein, which can also self-assemble leading to stimulation of GTPase activity. More than sixty highly conserved copies of the 3’ region of this gene are found elsewhere in the genome, particularly on chromosomes Y and 15. Alternatively spliced transcript variants encoding different isoforms have been described. 1759 ENSG00000106976 DNM1 dynamin 1
Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. 72 ENSG00000163017 ACTG2 actin, gamma 2, smooth muscle, enteric
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",2,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 3 Annotations

out <- mygene::queryMany(gene_list[3,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id name query summary notfound
MBP 4155 myelin basic protein ENSG00000197971 The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. NA
MYH7 4625 myosin, heavy chain 7, cardiac muscle, beta ENSG00000092054 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. NA
RP11-862L9.3 ENSG00000266844 NA ENSG00000266844 NA NA
FTL 2512 ferritin, light polypeptide ENSG00000087086 This gene encodes the light subunit of the ferritin protein. Ferritin is the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in this light chain ferritin gene are associated with several neurodegenerative diseases and hyperferritinemia-cataract syndrome. This gene has multiple pseudogenes. NA
DES 1674 desmin ENSG00000175084 This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. NA
MYH11 4629 myosin, heavy chain 11, smooth muscle ENSG00000133392 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. NA
GFAP 2670 glial fibrillary acidic protein ENSG00000131095 This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. NA
TPO 7173 thyroid peroxidase ENSG00000115705 This gene encodes a membrane-bound glycoprotein. The encoded protein acts as an enzyme and plays a central role in thyroid gland function. The protein functions in the iodination of tyrosine residues in thyroglobulin and phenoxy-ester formation between pairs of iodinated tyrosines to generate the thyroid hormones, thyroxine and triiodothyronine. Mutations in this gene are associated with several disorders of thyroid hormonogenesis, including congenital hypothyroidism, congenital goiter, and thyroid hormone organification defect IIA. Multiple transcript variants encoding distinct isoforms have been identified for this gene, but the full-length nature of some variants has not been determined. NA
TG 7038 thyroglobulin ENSG00000042832 Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. NA
CYP11B1 1584 cytochrome P450 family 11 subfamily B member 1 ENSG00000160882 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the mitochondrial inner membrane and is involved in the conversion of progesterone to cortisol in the adrenal cortex. Mutations in this gene cause congenital adrenal hyperplasia due to 11-beta-hydroxylase deficiency. Transcript variants encoding different isoforms have been noted for this gene. NA
SERPINA1 5265 serpin family A member 1 ENSG00000197249 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. NA
IGFBP5 3488 insulin like growth factor binding protein 5 ENSG00000115461 NA NA
KRT14 3861 keratin 14 ENSG00000186847 This gene encodes a member of the keratin family, the most diverse group of intermediate filaments. This gene product, a type I keratin, is usually found as a heterotetramer with two keratin 5 molecules, a type II keratin. Together they form the cytoskeleton of epithelial cells. Mutations in the genes for these keratins are associated with epidermolysis bullosa simplex. At least one pseudogene has been identified at 17p12-p11. NA
LPAR1 1902 lysophosphatidic acid receptor 1 ENSG00000198121 The integral membrane protein encoded by this gene is a lysophosphatidic acid (LPA) receptor from a group known as EDG receptors. These receptors are members of the G protein-coupled receptor superfamily. Utilized by LPA for cell signaling, EDG receptors mediate diverse biologic functions, including proliferation, platelet aggregation, smooth muscle contraction, inhibition of neuroblastoma cell differentiation, chemotaxis, and tumor cell invasion. Two transcript variants encoding the same protein have been identified for this gene NA
CYP17A1 1586 cytochrome P450 family 17 subfamily A member 1 ENSG00000148795 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. It has both 17alpha-hydroxylase and 17,20-lyase activities and is a key enzyme in the steroidogenic pathway that produces progestins, mineralocorticoids, glucocorticoids, androgens, and estrogens. Mutations in this gene are associated with isolated steroid-17 alpha-hydroxylase deficiency, 17-alpha-hydroxylase/17,20-lyase deficiency, pseudohermaphroditism, and adrenal hyperplasia. NA
ACTA2 59 actin, alpha 2, smooth muscle, aorta ENSG00000107796 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. NA
ALB 213 albumin ENSG00000163631 Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. NA
QDPR 5860 quinoid dihydropteridine reductase ENSG00000151552 This gene encodes the enzyme dihydropteridine reductase, which catalyzes the NADH-mediated reduction of quinonoid dihydrobiopterin. This enzyme is an essential component of the pterin-dependent aromatic amino acid hydroxylating systems. Mutations in this gene resulting in QDPR deficiency include aberrant splicing, amino acid substitutions, insertions, or premature terminations. Dihydropteridine reductase deficiency presents as atypical phenylketonuria due to insufficient production of biopterin, a cofactor for phenylalanine hydroxylase. NA
SAA1 6288 serum amyloid A1 ENSG00000173432 This gene encodes a member of the serum amyloid A family of apolipoproteins. The encoded preproprotein is proteolytically processed to generate the mature protein. This protein is a major acute phase protein that is highly expressed in response to inflammation and tissue injury. This protein also plays an important role in HDL metabolism and cholesterol homeostasis. High levels of this protein are associated with chronic inflammatory diseases including atherosclerosis, rheumatoid arthritis, Alzheimer’s disease and Crohn’s disease. This protein may also be a potential biomarker for certain tumors. Alternate splicing results in multiple transcript variants that encode the same protein. A pseudogene of this gene is found on chromosome 11. NA
TNNT2 7139 troponin T2, cardiac type ENSG00000118194 The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. NA
APOD 347 apolipoprotein D ENSG00000189058 This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. NA
RBP4 5950 retinol binding protein 4 ENSG00000138207 This protein belongs to the lipocalin family and is the specific carrier for retinol (vitamin A alcohol) in the blood. It delivers retinol from the liver stores to the peripheral tissues. In plasma, the RBP-retinol complex interacts with transthyretin which prevents its loss by filtration through the kidney glomeruli. A deficiency of vitamin A blocks secretion of the binding protein posttranslationally and results in defective delivery and supply to the epidermal cells. NA
HBB 3043 hemoglobin subunit beta ENSG00000244734 The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. NA
ACTG2 72 actin, gamma 2, smooth muscle, enteric ENSG00000163017 Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. NA
ABCA2 20 ATP binding cassette subfamily A member 2 ENSG00000107331 The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intracellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the ABC1 subfamily. Members of the ABC1 subfamily comprise the only major ABC subfamily found exclusively in multicellular eukaryotes. This protein is highly expressed in brain tissue and may play a role in macrophage lipid metabolism and neural development. Two transcript variants encoding different isoforms have been found for this gene. NA
SAA2 6289 serum amyloid A2 ENSG00000134339 NA NA
TGM2 7052 transglutaminase 2 ENSG00000198959 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene acts as a monomer, is induced by retinoic acid, and appears to be involved in apoptosis. Finally, the encoded protein is the autoantigen implicated in celiac disease. Two transcript variants encoding different isoforms have been found for this gene. NA
MAP4 4134 microtubule associated protein 4 ENSG00000047849 The protein encoded by this gene is a major non-neuronal microtubule-associated protein. This protein contains a domain similar to the microtubule-binding domains of neuronal microtubule-associated protein (MAP2) and microtubule-associated protein tau (MAPT/TAU). This protein promotes microtubule assembly, and has been shown to counteract destabilization of interphase microtubule catastrophe promotion. Cyclin B was found to interact with this protein, which targets cell division cycle 2 (CDC2) kinase to microtubules. The phosphorylation of this protein affects microtubule properties and cell cycle progression. Multiple transcript variants encoding different isoforms have been found for this gene. NA
SERPINE1 5054 serpin family E member 1 ENSG00000106366 This gene encodes a member of the serine proteinase inhibitor (serpin) superfamily. This member is the principal inhibitor of tissue plasminogen activator (tPA) and urokinase (uPA), and hence is an inhibitor of fibrinolysis. Defects in this gene are the cause of plasminogen activator inhibitor-1 deficiency (PAI-1 deficiency), and high concentrations of the gene product are associated with thrombophilia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
KRT8 3856 keratin 8 ENSG00000170421 This gene is a member of the type II keratin family clustered on the long arm of chromosome 12. Type I and type II keratins heteropolymerize to form intermediate-sized filaments in the cytoplasm of epithelial cells. The product of this gene typically dimerizes with keratin 18 to form an intermediate filament in simple single-layered epithelial cells. This protein plays a role in maintaining cellular structural integrity and also functions in signal transduction and cellular differentiation. Mutations in this gene cause cryptogenic cirrhosis. Alternatively spliced transcript variants have been found for this gene. NA
MYH6 4624 myosin, heavy chain 6, cardiac muscle, alpha ENSG00000197616 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. NA
RP11-394O4.5 ENSG00000269936 NA ENSG00000269936 NA NA
SPINK5 11005 serine peptidase inhibitor, Kazal type 5 ENSG00000133710 This gene encodes a multidomain serine protease inhibitor that contains 15 potential inhibitory domains. The encoded preproprotein is proteolytically processed to generate multiple protein products, which may exhibit unique activities and specificities. These proteins may play a role in skin and hair morphogenesis, as well as anti-inflammatory and antimicrobial protection of mucous epithelia. Mutations in this gene may result in Netherton syndrome, a disorder characterized by ichthyosis, defective cornification, and atopy. This gene is present in a gene cluster on chromosome 5. Alternative splicing results in multiple transcript variants. NA
MPZ 4359 myelin protein zero ENSG00000158887 This gene is specifically expressed in Schwann cells of the peripheral nervous system and encodes a type I transmembrane glycoprotein that is a major structural protein of the peripheral myelin sheath. The encoded protein contains a large hydrophobic extracellular domain and a smaller basic intracellular domain, which are essential for the formation and stabilization of the multilamellar structure of the compact myelin. Mutations in this gene are associated with autosomal dominant form of Charcot-Marie-Tooth disease type 1 (CMT1B) and other polyneuropathies, such as Dejerine-Sottas syndrome (DSS) and congenital hypomyelinating neuropathy (CHN). A recent study showed that two isoforms are produced from the same mRNA by use of alternative in-frame translation termination codons via a stop codon readthrough mechanism. NA
GLUL 2752 glutamate-ammonia ligase ENSG00000135821 The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. NA
SAA2-SAA4 100528017 SAA2-SAA4 readthrough ENSG00000255071 This locus represents naturally occurring read-through transcription between the neighboring serum amyloid A2 and serum amyloid A4 genes on chromosome 11. The read-through transcript produces a fusion protein that shares sequence identity with each individual gene product. NA
NDRG2 57447 NDRG family member 2 ENSG00000165795 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that may play a role in neurite outgrowth. This gene may be involved in glioblastoma carcinogenesis. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. NA
ALDOB 229 aldolase, fructose-bisphosphate B ENSG00000136872 Fructose-1,6-bisphosphate aldolase (EC 4.1.2.13) is a tetrameric glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Vertebrates have 3 aldolase isozymes which are distinguished by their electrophoretic and catalytic properties. Differences indicate that aldolases A, B, and C are distinct proteins, the products of a family of related ‘housekeeping’ genes exhibiting developmentally regulated expression of the different isozymes. The developing embryo produces aldolase A, which is produced in even greater amounts in adult muscle where it can be as much as 5% of total cellular protein. In adult liver, kidney and intestine, aldolase A expression is repressed and aldolase B is produced. In brain and other nervous tissue, aldolase A and C are expressed about equally. There is a high degree of homology between aldolase A and C. Defects in ALDOB cause hereditary fructose intolerance. NA
MYL7 58498 myosin light chain 7 ENSG00000106631 NA NA
MFGE8 4240 milk fat globule-EGF factor 8 protein ENSG00000140545 This gene encodes a preproprotein that is proteolytically processed to form multiple protein products. The major encoded protein product, lactadherin, is a membrane glycoprotein that promotes phagocytosis of apoptotic cells. This protein has also been implicated in wound healing, autoimmune disease, and cancer. Lactadherin can be further processed to form a smaller cleavage product, medin, which comprises the major protein component of aortic medial amyloid (AMA). Alternative splicing results in multiple transcript variants. NA
HP 3240 haptoglobin ENSG00000257017 This gene encodes a preproprotein, which is processed to yield both alpha and beta chains, which subsequently combine as a tetramer to produce haptoglobin. Haptoglobin functions to bind free plasma hemoglobin, which allows degradative enzymes to gain access to the hemoglobin, while at the same time preventing loss of iron through the kidneys and protecting the kidneys from damage by hemoglobin. Mutations in this gene and/or its regulatory regions cause ahaptoglobinemia or hypohaptoglobinemia. This gene has also been linked to diabetic nephropathy, the incidence of coronary artery disease in type 1 diabetes, Crohn’s disease, inflammatory disease behavior, primary sclerosing cholangitis, susceptibility to idiopathic Parkinson’s disease, and a reduced incidence of Plasmodium falciparum malaria. The protein encoded also exhibits antimicrobial activity against bacteria. A similar duplicated gene is located next to this gene on chromosome 16. Multiple transcript variants encoding different isoforms have been found for this gene. NA
MYL2 4633 myosin light chain 2 ENSG00000111245 Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. NA
HBA2 3040 hemoglobin subunit alpha 2 ENSG00000188536 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. NA
FEZ1 9638 fasciculation and elongation protein zeta 1 ENSG00000149557 This gene is an ortholog of the C. elegans unc-76 gene, which is necessary for normal axonal bundling and elongation within axon bundles. Expression of this gene in C. elegans unc-76 mutants can restore to the mutants partial locomotion and axonal fasciculation, suggesting that it also functions in axonal outgrowth. The N-terminal half of the gene product is highly acidic. Alternatively spliced transcript variants encoding different isoforms of this protein have been described. NA
FNBP1 23048 formin binding protein 1 ENSG00000187239 The protein encoded by this gene is a member of the formin-binding-protein family. The protein contains an N-terminal Fer/Cdc42-interacting protein 4 (CIP4) homology (FCH) domain followed by a coiled-coil domain, a proline-rich motif, a second coiled-coil domain, a Rho family protein-binding domain (RBD), and a C-terminal SH3 domain. This protein binds sorting nexin 2 (SNX2), tankyrase (TNKS), and dynamin; an interaction between this protein and formin has not been demonstrated yet in human. NA
COL1A2 1278 collagen type I alpha 2 chain ENSG00000164692 This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. NA
MTCO1P12 ENSG00000237973 MT-CO1 pseudogene 12 ENSG00000237973 NA NA
HBA1 3039 hemoglobin subunit alpha 1 ENSG00000206172 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. NA
TPM1 7168 tropomyosin 1 (alpha) ENSG00000140416 This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. NA
ZCCHC24 219654 zinc finger CCHC-type containing 24 ENSG00000165424 NA NA
ZEB2 9839 zinc finger E-box binding homeobox 2 ENSG00000169554 The protein encoded by this gene is a member of the Zfh1 family of 2-handed zinc finger/homeodomain proteins. It is located in the nucleus and functions as a DNA-binding transcriptional repressor that interacts with activated SMADs. Mutations in this gene are associated with Hirschsprung disease/Mowat-Wilson syndrome. Alternatively spliced transcript variants have been found for this gene. NA
C1orf198 84886 chromosome 1 open reading frame 198 ENSG00000119280 NA NA
SCARB1 949 scavenger receptor class B member 1 ENSG00000073060 The protein encoded by this gene is a plasma membrane receptor for high density lipoprotein cholesterol (HDL). The encoded protein mediates cholesterol transfer to and from HDL. In addition, this protein is a receptor for hepatitis C virus glycoprotein E2. Two transcript variants encoding different isoforms have been found for this gene. NA
NBEAL2 23218 neurobeachin like 2 ENSG00000160796 The protein encoded by this gene contains a beige and Chediak-Higashi (BEACH) domain and multiple WD40 domains, and may play a role in megakaryocyte alpha-granule biogenesis. Mutations in this gene are a cause of gray platelet syndrome. NA
NA NA NA ENSG00000117289 NA TRUE
NBEA 26960 neurobeachin ENSG00000172915 This gene encodes a member of a large, diverse group of A-kinase anchor proteins that target the activity of protein kinase A to specific subcellular sites by binding to its type II regulatory subunits. Brain-specific expression and coat protein-like membrane recruitment of a highly similar protein in mouse suggest an involvement in neuronal post-Golgi membrane traffic. Mutations in this gene may be associated with a form of autism. This gene and its expression are frequently disrupted in patients with multiple myeloma. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Additional transcript variants may exist, but their full-length nature has not been determined. NA
CRYAB 1410 crystallin alpha B ENSG00000109846 Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. NA
PEA15 8682 phosphoprotein enriched in astrocytes 15 ENSG00000162734 This gene encodes a death effector domain-containing protein that functions as a negative regulator of apoptosis. The encoded protein is an endogenous substrate for protein kinase C. This protein is also overexpressed in type 2 diabetes mellitus, where it may contribute to insulin resistance in glucose uptake. Alternative splicing results in multiple transcript variants. NA
CCNI 10983 cyclin I ENSG00000118816 The protein encoded by this gene belongs to the highly conserved cyclin family, whose members are characterized by a dramatic periodicity in protein abundance through the cell cycle. Cyclins function as regulators of CDK kinases. Different cyclins exhibit distinct expression and degradation patterns which contribute to the temporal coordination of each mitotic event. This cyclin shows the highest similarity with cyclin G. The transcript of this gene was found to be expressed constantly during cell cycle progression. The function of this cyclin has not yet been determined. NA
TSC22D4 81628 TSC22 domain family member 4 ENSG00000166925 TSC22D4 is a member of the TSC22 domain family of leucine zipper transcriptional regulators (see TSC22D3; MIM 300506) (Kester et al., 1999 [PubMed 10488076]; Fiorenza et al., 2001 [PubMed 11707329]). NA
HIPK2 28996 homeodomain interacting protein kinase 2 ENSG00000064393 This gene encodes a conserved serine/threonine kinase that is a member of the homeodomain-interacting protein kinase family. The encoded protein interacts with homeodomain transcription factors and many other transcription factors such as p53, and can function as both a corepressor and a coactivator depending on the transcription factor and its subcellular localization. Multiple transcript variants encoding different isoforms have been found for this gene. NA
BSG 682 basigin (Ok blood group) ENSG00000172270 The protein encoded by this gene is a plasma membrane protein that is important in spermatogenesis, embryo implantation, neural network formation, and tumor progression. The encoded protein is also a member of the immunoglobulin superfamily. Multiple transcript variants encoding different isoforms have been found for this gene. NA
DCN 1634 decorin ENSG00000011465 This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. NA
ORM1 5004 orosomucoid 1 ENSG00000229314 This gene encodes a key acute phase plasma protein. Because of its increase due to acute inflammation, this protein is classified as an acute-phase reactant. The specific function of this protein has not yet been determined; however, it may be involved in aspects of immunosuppression. NA
KRT10 3858 keratin 10 ENSG00000186395 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. NA
LRG1 116844 leucine rich alpha-2-glycoprotein 1 ENSG00000171236 The leucine-rich repeat (LRR) family of proteins, including LRG1, have been shown to be involved in protein-protein interaction, signal transduction, and cell adhesion and development. LRG1 is expressed during granulocyte differentiation (O’Donnell et al., 2002 [PubMed 12223515]). NA
AEBP1 165 AE binding protein 1 ENSG00000106624 This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. NA
CASQ2 845 calsequestrin 2 ENSG00000118729 The protein encoded by this gene specifies the cardiac muscle family member of the calsequestrin family. Calsequestrin is localized to the sarcoplasmic reticulum in cardiac and slow skeletal muscle cells. The protein is a calcium binding protein that stores calcium for muscle function. Mutations in this gene cause stress-induced polymorphic ventricular tachycardia, also referred to as catecholaminergic polymorphic ventricular tachycardia 2 (CPVT2), a disease characterized by bidirectional ventricular tachycardia that may lead to cardiac arrest. NA
PYGB 5834 phosphorylase, glycogen; brain ENSG00000100994 The protein encoded by this gene is a glycogen phosphorylase found predominantly in the brain. The encoded protein forms homodimers which can associate into homotetramers, the enzymatically active form of glycogen phosphorylase. The activity of this enzyme is positively regulated by AMP and negatively regulated by ATP, ADP, and glucose-6-phosphate. This enzyme catalyzes the rate-determining step in glycogen degradation. NA
MYOM2 9172 myomesin 2 ENSG00000036448 The giant protein titin, together with its associated proteins, interconnects the major structure of sarcomeres, the M bands and Z discs. The C-terminal end of the titin string extends into the M line, where it binds tightly to M-band constituents of apparent molecular masses of 190 kD and 165 kD. The predicted MYOM2 protein contains 1,465 amino acids. Like MYOM1, MYOM2 has a unique N-terminal domain followed by 12 repeat domains with strong homology to either fibronectin type III or immunoglobulin C2 domains. Protein sequence comparisons suggested that the MYOM2 protein and bovine M protein are identical. NA
TCAP 8557 titin-cap ENSG00000173991 Sarcomere assembly is regulated by the muscle protein titin. Titin is a giant elastic protein with kinase activity that extends half the length of a sarcomere. It serves as a scaffold to which myofibrils and other muscle related proteins are attached. This gene encodes a protein found in striated and cardiac muscle that binds to the titin Z1-Z2 domains and is a substrate of titin kinase, interactions thought to be critical to sarcomere assembly. Mutations in this gene are associated with limb-girdle muscular dystrophy type 2G. NA
GAPDH 2597 glyceraldehyde-3-phosphate dehydrogenase ENSG00000111640 This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. NA
MYH1 4619 myosin, heavy chain 1, skeletal muscle, adult ENSG00000109061 Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development. NA
AOC3 8639 amine oxidase, copper containing 3 ENSG00000131471 This gene encodes a member of the semicarbazide-sensitive amine oxidase family. Copper amine oxidases catalyze the oxidative conversion of amines to aldehydes in the presence of copper and quinone cofactor. The encoded protein is localized to the cell surface, has adhesive properties as well as monoamine oxidase activity, and may be involved in leukocyte trafficking. Alterations in levels of the encoded protein may be associated with many diseases, including diabetes mellitus. A pseudogene of this gene has been described and is located approximately 9-kb downstream on the same chromosome. Alternative splicing results in multiple transcript variants. NA
MAPK8IP1 9479 mitogen-activated protein kinase 8 interacting protein 1 ENSG00000121653 This gene encodes a regulator of the pancreatic beta-cell function. It is highly similar to JIP-1, a mouse protein known to be a regulator of c-Jun amino-terminal kinase (Mapk8). This protein has been shown to prevent MAPK8 mediated activation of transcription factors, and to decrease IL-1 beta and MAP kinase kinase 1 (MEKK1) induced apoptosis in pancreatic beta cells. This protein also functions as a DNA-binding transactivator of the glucose transporter GLUT2. RE1-silencing transcription factor (REST) is reported to repress the expression of this gene in insulin-secreting beta cells. This gene is found to be mutated in a type 2 diabetes family, and thus is thought to be a susceptibility gene for type 2 diabetes. NA
EZR 7430 ezrin ENSG00000092820 The cytoplasmic peripheral membrane protein encoded by this gene functions as a protein-tyrosine kinase substrate in microvilli. As a member of the ERM protein family, this protein serves as an intermediate between the plasma membrane and the actin cytoskeleton. This protein plays a key role in cell surface structure adhesion, migration and organization, and it has been implicated in various human cancers. A pseudogene located on chromosome 3 has been identified for this gene. Alternatively spliced variants have also been described for this gene. NA
F3 2152 coagulation factor III, tissue factor ENSG00000117525 This gene encodes coagulation factor III which is a cell surface glycoprotein. This factor enables cells to initiate the blood coagulation cascades, and it functions as the high-affinity receptor for the coagulation factor VII. The resulting complex provides a catalytic event that is responsible for initiation of the coagulation protease cascades by specific limited proteolysis. Unlike the other cofactors of these protease cascades, which circulate as nonfunctional precursors, this factor is a potent initiator that is fully functional when expressed on cell surfaces. There are 3 distinct domains of this factor: extracellular, transmembrane, and cytoplasmic. This protein is the only one in the coagulation pathway for which a congenital deficiency has not been described. Alternate splicing results in multiple transcript variants. NA
PALLD 23022 palladin, cytoskeletal associated protein ENSG00000129116 This gene encodes a cytoskeletal protein that is required for organizing the actin cytoskeleton. The protein is a component of actin-containing microfilaments, and it is involved in the control of cell shape, adhesion, and contraction. Polymorphisms in this gene are associated with a susceptibility to pancreatic cancer type 1, and also with a risk for myocardial infarction. Alternative splicing results in multiple transcript variants. NA
MB 4151 myoglobin ENSG00000198125 This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. NA
ITGB4 3691 integrin subunit beta 4 ENSG00000132470 Integrins are heterodimers comprised of alpha and beta subunits, that are noncovalently associated transmembrane glycoprotein receptors. Different combinations of alpha and beta polypeptides form complexes that vary in their ligand-binding specificities. Integrins mediate cell-matrix or cell-cell adhesion, and transduced signals that regulate gene expression and cell growth. This gene encodes the integrin beta 4 subunit, a receptor for the laminins. This subunit tends to associate with alpha 6 subunit and is likely to play a pivotal role in the biology of invasive carcinoma. Mutations in this gene are associated with epidermolysis bullosa with pyloric atresia. Multiple alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. NA
REG1A 5967 regenerating family member 1 alpha ENSG00000115386 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. NA
SYNPO 11346 synaptopodin ENSG00000171992 Synaptopodin is an actin-associated protein that may play a role in actin-based cell shape and motility. The name synaptopodin derives from the protein’s associations with postsynaptic densities and dendritic spines and with renal podocytes (Mundel et al., 1997 [PubMed 9314539]). NA
HSPB7 27129 heat shock protein family B (small) member 7 ENSG00000173641 NA NA
RTKN 6242 rhotekin ENSG00000114993 This gene encodes a scaffold protein that interacts with GTP-bound Rho proteins. Binding of this protein inhibits the GTPase activity of Rho proteins. This protein may interfere with the conversion of active, GTP-bound Rho to the inactive GDP-bound form by RhoGAP. Rho proteins regulate many important cellular processes, including cytokinesis, transcription, smooth muscle contraction, cell growth and transformation. Dysregulation of the Rho signal transduction pathway has been implicated in many forms of cancer. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
MYL3 4634 myosin light chain 3 ENSG00000160808 MYL3 encodes myosin light chain 3, an alkali light chain also referred to in the literature as both the ventricular isoform and the slow skeletal muscle isoform. Mutations in MYL3 have been identified as a cause of mid-left ventricular chamber type hypertrophic cardiomyopathy. NA
PAQR6 79957 progestin and adipoQ receptor family member 6 ENSG00000160781 NA NA
FGA 2243 fibrinogen alpha chain ENSG00000171560 This gene encodes the alpha subunit of the coagulation factor fibrinogen, which is a component of the blood clot. Following vascular injury, the encoded preproprotein is proteolytically processed by thrombin during the conversion of fibrinogen to fibrin. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia, afibrinogenemia and renal amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. NA
HRC 3270 histidine rich calcium binding protein ENSG00000130528 This gene encodes a luminal sarcoplasmic reticulum protein identified by its ability to bind low-density lipoprotein with high affinity. The protein interacts with the cytoplasmic domain of triadin, the main transmembrane protein of the junctional sarcoplasmic reticulum (SR) of skeletal muscle. The protein functions in the regulation of releasable calcium into the SR. NA
RPL37A 6168 ribosomal protein L37a ENSG00000197756 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L37AE family of ribosomal proteins. It is located in the cytoplasm. The protein contains a C4-type zinc finger-like domain. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. NA
CMYA5 202333 cardiomyopathy associated 5 ENSG00000164309 NA NA
NRAP 4892 nebulin related anchoring protein ENSG00000197893 NA NA
ANKRD1 27063 ankyrin repeat domain 1 ENSG00000148677 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. NA
A2M 2 alpha-2-macroglobulin ENSG00000175899 Alpha-2-macroglobulin is a protease inhibitor and cytokine transporter. It inhibits many proteases, including trypsin, thrombin and collagenase. A2M is implicated in Alzheimer disease (AD) due to its ability to mediate the clearance and degradation of A-beta, the major component of beta-amyloid deposits. NA
KIAA0930 23313 KIAA0930 ENSG00000100364 NA NA
IGFBP2 3485 insulin like growth factor binding protein 2 ENSG00000115457 The protein encoded by this gene is one of six similar proteins that bind insulin-like growth factors I and II (IGF-I and IGF-II). The encoded protein can be secreted into the bloodstream, where it binds IGF-I and IGF-II with high affinity, or it can remain intracellular, interacting with many different ligands. High expression levels of this protein promote the growth of several types of tumors and may be predictive of the chances of recovery of the patient. Several transcript variants, one encoding a secreted isoform and the others encoding nonsecreted isoforms, have been found for this gene. NA
CD36 948 CD36 molecule ENSG00000135218 The protein encoded by this gene is the fourth major glycoprotein of the platelet surface and serves as a receptor for thrombospondin in platelets and various cell lines. Since thrombospondins are widely distributed proteins involved in a variety of adhesive processes, this protein may have important functions as a cell adhesion molecule. It binds to collagen, thrombospondin, anionic phospholipids and oxidized LDL. It directly mediates cytoadherence of Plasmodium falciparum parasitized erythrocytes and it binds long chain fatty acids and may function in the transport and/or as a regulator of fatty acid transport. Mutations in this gene cause platelet glycoprotein deficiency. Multiple alternatively spliced transcript variants have been found for this gene. NA
NA NA NA ENSG00000140181 NA TRUE
CNP 1267 2’,3’-cyclic nucleotide 3’ phosphodiesterase ENSG00000173786 NA NA
NA NA NA ENSG00000256545 NA TRUE
REG1B 5968 regenerating family member 1 beta ENSG00000172023 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",3,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 4 Annotations

out <- mygene::queryMany(gene_list[4,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id summary query name
SAA1 6288 This gene encodes a member of the serum amyloid A family of apolipoproteins. The encoded preproprotein is proteolytically processed to generate the mature protein. This protein is a major acute phase protein that is highly expressed in response to inflammation and tissue injury. This protein also plays an important role in HDL metabolism and cholesterol homeostasis. High levels of this protein are associated with chronic inflammatory diseases including atherosclerosis, rheumatoid arthritis, Alzheimer’s disease and Crohn’s disease. This protein may also be a potential biomarker for certain tumors. Alternate splicing results in multiple transcript variants that encode the same protein. A pseudogene of this gene is found on chromosome 11. ENSG00000173432 serum amyloid A1
MYH11 4629 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000133392 myosin, heavy chain 11, smooth muscle
ACTG1 71 Actins are highly conserved proteins that are involved in various types of cell motility, and maintenance of the cytoskeleton. In vertebrates, three main groups of actin isoforms, alpha, beta and gamma have been identified. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton, and as mediators of internal cell motility. Actin, gamma 1, encoded by this gene, is a cytoplasmic actin found in non-muscle cells. Mutations in this gene are associated with DFNA20/26, a subtype of autosomal dominant non-syndromic sensorineural progressive hearing loss. Alternative splicing results in multiple transcript variants. ENSG00000184009 actin gamma 1
KRT10 3858 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. ENSG00000186395 keratin 10
FABP4 2167 FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. ENSG00000170323 fatty acid binding protein 4
CD36 948 The protein encoded by this gene is the fourth major glycoprotein of the platelet surface and serves as a receptor for thrombospondin in platelets and various cell lines. Since thrombospondins are widely distributed proteins involved in a variety of adhesive processes, this protein may have important functions as a cell adhesion molecule. It binds to collagen, thrombospondin, anionic phospholipids and oxidized LDL. It directly mediates cytoadherence of Plasmodium falciparum parasitized erythrocytes and it binds long chain fatty acids and may function in the transport and/or as a regulator of fatty acid transport. Mutations in this gene cause platelet glycoprotein deficiency. Multiple alternatively spliced transcript variants have been found for this gene. ENSG00000135218 CD36 molecule
PKM 5315 This gene encodes a protein involved in glycolysis. The encoded protein is a pyruvate kinase that catalyzes the transfer of a phosphoryl group from phosphoenolpyruvate to ADP, generating ATP and pyruvate. This protein has been shown to interact with thyroid hormone and may mediate cellular metabolic effects induced by thyroid hormones. This protein has been found to bind Opa protein, a bacterial outer membrane protein involved in gonococcal adherence to and invasion of human cells, suggesting a role of this protein in bacterial pathogenesis. Several alternatively spliced transcript variants encoding a few distinct isoforms have been reported. ENSG00000067225 pyruvate kinase, muscle
ACSL1 2180 The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. Several transcript variants encoding different isoforms have been found for this gene. ENSG00000151726 acyl-CoA synthetase long-chain family member 1
GAPDH 2597 This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. ENSG00000111640 glyceraldehyde-3-phosphate dehydrogenase
RBP4 5950 This protein belongs to the lipocalin family and is the specific carrier for retinol (vitamin A alcohol) in the blood. It delivers retinol from the liver stores to the peripheral tissues. In plasma, the RBP-retinol complex interacts with transthyretin which prevents its loss by filtration through the kidney glomeruli. A deficiency of vitamin A blocks secretion of the binding protein posttranslationally and results in defective delivery and supply to the epidermal cells. ENSG00000138207 retinol binding protein 4
THBS1 7057 The protein encoded by this gene is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein can bind to fibrinogen, fibronectin, laminin, type V collagen and integrins alpha-V/beta-1. This protein has been shown to play roles in platelet aggregation, angiogenesis, and tumorigenesis. ENSG00000137801 thrombospondin 1
ACTN4 81 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, alpha actinin isoform which is concentrated in the cytoplasm, and thought to be involved in metastatic processes. Mutations in this gene have been associated with focal and segmental glomerulosclerosis. ENSG00000130402 actinin alpha 4
PLIN2 123 The protein encoded by this gene belongs to the perilipin family, members of which coat intracellular lipid storage droplets. This protein is associated with the lipid globule surface membrane material, and maybe involved in development and maintenance of adipose tissue. However, it is not restricted to adipocytes as previously thought, but is found in a wide range of cultured cell lines, including fibroblasts, endothelial and epithelial cells, and tissues, such as lactating mammary gland, adrenal cortex, Sertoli and Leydig cells, and hepatocytes in alcoholic liver cirrhosis, suggesting that it may serve as a marker of lipid accumulation in diverse cell types and diseases. Alternatively spliced transcript variants have been found for this gene. ENSG00000147872 perilipin 2
ADH1B 125 The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000196616 alcohol dehydrogenase 1B (class I), beta polypeptide
PYGM 5837 This gene encodes a muscle enzyme involved in glycogenolysis. Highly similar enzymes encoded by different genes are found in liver and brain. Mutations in this gene are associated with McArdle disease (myophosphorylase deficiency), a glycogen storage disease of muscle. Alternative splicing results in multiple transcript variants. ENSG00000068976 phosphorylase, glycogen, muscle
ALDOA 226 The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10. ENSG00000149925 aldolase, fructose-bisphosphate A
REG1A 5967 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. ENSG00000115386 regenerating family member 1 alpha
ACTB 60 This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. ENSG00000075624 actin, beta
MYH9 4627 This gene encodes a conventional non-muscle myosin; this protein should not be confused with the unconventional myosin-9a or 9b (MYO9A or MYO9B). The encoded protein is a myosin IIA heavy chain that contains an IQ domain and a myosin head-like domain which is involved in several important functions, including cytokinesis, cell motility and maintenance of cell shape. Defects in this gene have been associated with non-syndromic sensorineural deafness autosomal dominant type 17, Epstein syndrome, Alport syndrome with macrothrombocytopenia, Sebastian syndrome, Fechtner syndrome and macrothrombocytopenia with progressive sensorineural deafness. ENSG00000100345 myosin, heavy chain 9, non-muscle
GP2 2813 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. ENSG00000169347 glycoprotein 2
ACTA2 59 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. ENSG00000107796 actin, alpha 2, smooth muscle, aorta
GRINA 2907 NA ENSG00000178719 glutamate ionotropic receptor NMDA type subunit associated protein 1
LPL 4023 LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. ENSG00000175445 lipoprotein lipase
S100A9 6280 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. ENSG00000163220 S100 calcium binding protein A9
ECHDC2 55268 NA ENSG00000121310 enoyl-CoA hydratase domain containing 2
HBA2 3040 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. ENSG00000188536 hemoglobin subunit alpha 2
MYH2 4620 Myosins are actin-based motor proteins that function in the generation of mechanical force in eukaryotic cells. Muscle myosins are heterohexamers composed of 2 myosin heavy chains and 2 pairs of nonidentical myosin light chains. This gene encodes a member of the class II or conventional myosin heavy chains, and functions in skeletal muscle contraction. This gene is found in a cluster of myosin heavy chain genes on chromosome 17. A mutation in this gene results in inclusion body myopathy-3. Multiple alternatively spliced variants, encoding the same protein, have been identified. ENSG00000125414 myosin, heavy chain 2, skeletal muscle, adult
C1S 716 This gene encodes a serine protease, which is a major constituent of the human complement subcomponent C1. C1s associates with two other complement components C1r and C1q in order to yield the first component of the serum complement system. Defects in this gene are the cause of selective C1s deficiency. ENSG00000182326 complement component 1, s subcomponent
MYBPC1 4604 This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000196091 myosin binding protein C, slow type
CPA1 1357 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. ENSG00000091704 carboxypeptidase A1
ACADVL 37 The protein encoded by this gene is targeted to the inner mitochondrial membrane where it catalyzes the first step of the mitochondrial fatty acid beta-oxidation pathway. This acyl-Coenzyme A dehydrogenase is specific to long-chain and very-long-chain fatty acids. A deficiency in this gene product reduces myocardial fatty acid beta-oxidation and is associated with cardiomyopathy. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000072778 acyl-CoA dehydrogenase, very long chain
GLUL 2752 The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. ENSG00000135821 glutamate-ammonia ligase
PRSS1 5644 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. ENSG00000204983 protease, serine 1
CCNL2 81669 The protein encoded by this gene belongs to the cyclin family. Through its interaction with several proteins, such as RNA polymerase II, splicing factors, and cyclin-dependent kinases, this protein functions as a regulator of the pre-mRNA splicing process, as well as in inducing apoptosis by modulating the expression of apoptotic and antiapoptotic proteins. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. ENSG00000221978 cyclin L2
STAB1 23166 This gene encodes a large, transmembrane receptor protein which may function in angiogenesis, lymphocyte homing, cell adhesion, or receptor scavenging. The protein contains 7 fasciclin, 16 epidermal growth factor (EGF)-like, and 2 laminin-type EGF-like domains as well as a C-type lectin-like hyaluronan-binding Link module. The protein is primarily expressed on sinusoidal endothelial cells of liver, spleen, and lymph node. The receptor has been shown to endocytose ligands such as low density lipoprotein, Gram-positive and Gram-negative bacteria, and advanced glycosylation end products. Supporting its possible role as a scavenger receptor, the protein rapidly cycles between the plasma membrane and early endosomes. ENSG00000010327 stabilin 1
SERPINF1 5176 The protein encoded by this gene is a member of the serpin family, although it does not display the serine protease inhibitory activity shown by many of the other serpin family members. The encoded protein is secreted and strongly inhibits angiogenesis. In addition, this protein is a neurotrophic factor involved in neuronal differentiation in retinoblastoma cells. ENSG00000132386 serpin family F member 1
MYH1 4619 Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development. ENSG00000109061 myosin, heavy chain 1, skeletal muscle, adult
MYLK 4638 This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. ENSG00000065534 myosin light chain kinase
CEL 1056 The protein encoded by this gene is a glycoprotein secreted from the pancreas into the digestive tract and from the lactating mammary gland into human milk. The physiological role of this protein is in cholesterol and lipid-soluble vitamin ester hydrolysis and absorption. This encoded protein promotes large chylomicron production in the intestine. Also its presence in plasma suggests its interactions with cholesterol and oxidized lipoproteins to modulate the progression of atherosclerosis. In pancreatic tumoral cells, this encoded protein is thought to be sequestrated within the Golgi compartment and is probably not secreted. This gene contains a variable number of tandem repeat (VNTR) polymorphism in the coding region that may influence the function of the encoded protein. ENSG00000170835 carboxyl ester lipase
SAA2 6289 NA ENSG00000134339 serum amyloid A2
PLA2G2A 5320 The protein encoded by this gene is a member of the phospholipase A2 family (PLA2). PLA2s constitute a diverse family of enzymes with respect to sequence, function, localization, and divalent cation requirements. This gene product belongs to group II, which contains secreted form of PLA2, an extracellular enzyme that has a low molecular mass and requires calcium ions for catalysis. It catalyzes the hydrolysis of the sn-2 fatty acid acyl ester bond of phosphoglycerides, releasing free fatty acids and lysophospholipids, and thought to participate in the regulation of the phospholipid metabolism in biomembranes. Several alternatively spliced transcript variants with different 5’ UTRs have been found for this gene. ENSG00000188257 phospholipase A2 group IIA
APMAP 57136 NA ENSG00000101474 adipocyte plasma membrane associated protein
PLN 5350 The protein encoded by this gene is found as a pentamer and is a major substrate for the cAMP-dependent protein kinase in cardiac muscle. The encoded protein is an inhibitor of cardiac muscle sarcoplasmic reticulum Ca(2+)-ATPase in the unphosphorylated state, but inhibition is relieved upon phosphorylation of the protein. The subsequent activation of the Ca(2+) pump leads to enhanced muscle relaxation rates, thereby contributing to the inotropic response elicited in heart by beta-agonists. The encoded protein is a key regulator of cardiac diastolic function. Mutations in this gene are a cause of inherited human dilated cardiomyopathy with refractory congestive heart failure, and also familial hypertrophic cardiomyopathy. ENSG00000198523 phospholamban
CPB1 1360 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. ENSG00000153002 carboxypeptidase B1
COL6A1 1291 The collagens are a superfamily of proteins that play a role in maintaining the integrity of various tissues. Collagens are extracellular matrix proteins and have a triple-helical domain as their common structural element. Collagen VI is a major structural component of microfibrils. The basic structural unit of collagen VI is a heterotrimer of the alpha1(VI), alpha2(VI), and alpha3(VI) chains. The alpha2(VI) and alpha3(VI) chains are encoded by the COL6A2 and COL6A3 genes, respectively. The protein encoded by this gene is the alpha 1 subunit of type VI collagen (alpha1(VI) chain). Mutations in the genes that code for the collagen VI subunits result in the autosomal dominant disorder, Bethlem myopathy. ENSG00000142156 collagen type VI alpha 1
OAF 220323 NA ENSG00000184232 out at first homolog
TTN 7273 This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. ENSG00000155657 titin
KRT1 3848 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000167768 keratin 1
ACTN1 87 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, cytoskeletal, alpha actinin isoform and maps to the same site as the structurally similar erythroid beta spectrin gene. Three transcript variants encoding different isoforms have been found for this gene. ENSG00000072110 actinin alpha 1
DGAT2 84649 This gene encodes one of two enzymes which catalyzes the final reaction in the synthesis of triglycerides in which diacylglycerol is covalently bound to long chain fatty acyl-CoAs. The encoded protein catalyzes this reaction at low concentrations of magnesium chloride while the other enzyme has high activity at high concentrations of magnesium chloride. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000062282 diacylglycerol O-acyltransferase 2
IQGAP1 8826 This gene encodes a member of the IQGAP family. The protein contains four IQ domains, one calponin homology domain, one Ras-GAP domain and one WW domain. It interacts with components of the cytoskeleton, with cell adhesion molecules, and with several signaling molecules to regulate cell morphology and motility. Expression of the protein is upregulated by gene amplification in two gastric cancer cell lines. ENSG00000140575 IQ motif containing GTPase activating protein 1
MEF2C 4208 This locus encodes a member of the MADS box transcription enhancer factor 2 (MEF2) family of proteins, which play a role in myogenesis. The encoded protein, MEF2 polypeptide C, has both trans-activating and DNA binding activities. This protein may play a role in maintaining the differentiated state of muscle cells. Mutations and deletions at this locus have been associated with severe mental retardation, stereotypic movements, epilepsy, and cerebral malformation. Alternatively spliced transcript variants have been described. ENSG00000081189 myocyte enhancer factor 2C
NNMT 4837 N-methylation is one method by which drug and other xenobiotic compounds are metabolized by the liver. This gene encodes the protein responsible for this enzymatic activity which uses S-adenosyl methionine as the methyl donor. ENSG00000166741 nicotinamide N-methyltransferase
RGS5 8490 This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. ENSG00000143248 regulator of G-protein signaling 5
CELA3A 10136 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. ENSG00000142789 chymotrypsin like elastase family member 3A
SPINT1 6692 The protein encoded by this gene is a member of the Kunitz family of serine protease inhibitors. The protein is a potent inhibitor specific for HGF activator and is thought to be involved in the regulation of the proteolytic activation of HGF in injured tissues. Alternative splicing results in multiple variants encoding different isoforms. ENSG00000166145 serine peptidase inhibitor, Kunitz type 1
LYZ 4069 This gene encodes human lysozyme, whose natural substrate is the bacterial cell wall peptidoglycan (cleaving the beta[1-4]glycosidic linkages between N-acetylmuramic acid and N-acetylglucosamine). Lysozyme is one of the antimicrobial agents found in human milk, and is also present in spleen, lung, kidney, white blood cells, plasma, saliva, and tears. The protein has antibacterial activity against a number of bacterial species. Missense mutations in this gene have been identified in heritable renal amyloidosis. ENSG00000090382 lysozyme
SYNM 23336 The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene. ENSG00000182253 synemin
NEB 4703 This gene encodes nebulin, a giant protein component of the cytoskeletal matrix that coexists with the thick and thin filaments within the sarcomeres of skeletal muscle. In most vertebrates, nebulin accounts for 3 to 4% of the total myofibrillar protein. The encoded protein contains approximately 30-amino acid long modules that can be classified into 7 types and other repeated modules. Protein isoform sizes vary from 600 to 800 kD due to alternative splicing that is tissue-, species-,and developmental stage-specific. Of the 183 exons in the nebulin gene, at least 43 are alternatively spliced, although exons 143 and 144 are not found in the same transcript. Of the several thousand transcript variants predicted for nebulin, the RefSeq Project has decided to create three representative RefSeq records. Mutations in this gene are associated with recessive nemaline myopathy. ENSG00000183091 nebulin
MYBPC3 4607 MYBPC3 encodes the cardiac isoform of myosin-binding protein C. Myosin-binding protein C is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. MYBPC3, the cardiac isoform, is expressed exclussively in heart muscle. Regulatory phosphorylation of the cardiac isoform in vivo by cAMP-dependent protein kinase (PKA) upon adrenergic stimulation may be linked to modulation of cardiac contraction. Mutations in MYBPC3 are one cause of familial hypertrophic cardiomyopathy. ENSG00000134571 myosin binding protein C, cardiac
CNN1 1264 NA ENSG00000130176 calponin 1
HYAL1 3373 This gene encodes a lysosomal hyaluronidase. Hyaluronidases intracellularly degrade hyaluronan, one of the major glycosaminoglycans of the extracellular matrix. Hyaluronan is thought to be involved in cell proliferation, migration and differentiation. This enzyme is active at an acidic pH and is the major hyaluronidase in plasma. Mutations in this gene are associated with mucopolysaccharidosis type IX, or hyaluronidase deficiency. The gene is one of several related genes in a region of chromosome 3p21.3 associated with tumor suppression. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000114378 hyaluronoglucosaminidase 1
CALM3 808 NA ENSG00000160014 calmodulin 3 (phosphorylase kinase, delta)
CALM2 805 This gene is a member of the calmodulin gene family. There are three distinct calmodulin genes dispersed throughout the genome that encode the identical protein, but differ at the nucleotide level. Calmodulin is a calcium binding protein that plays a role in signaling pathways, cell cycle progression and proliferation. Several infants with severe forms of long-QT syndrome (LQTS) who displayed life-threatening ventricular arrhythmias together with delayed neurodevelopment and epilepsy were found to have mutations in either this gene or another member of the calmodulin gene family (PMID:23388215). Mutations in this gene have also been identified in patients with less severe forms of LQTS (PMID:24917665), while mutations in another calmodulin gene family member have been associated with catecholaminergic polymorphic ventricular tachycardia (CPVT)(PMID:23040497), a rare disorder thought to be the cause of a significant fraction of sudden cardiac deaths in young individuals. Pseudogenes of this gene are found on chromosomes 10, 13, and 17. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000160014 calmodulin 2 (phosphorylase kinase, delta)
REG1B 5968 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. ENSG00000172023 regenerating family member 1 beta
G0S2 50486 NA ENSG00000123689 G0/G1 switch 2
C12orf75 387882 NA ENSG00000235162 chromosome 12 open reading frame 75
TNNT2 7139 The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. ENSG00000118194 troponin T2, cardiac type
FBN1 2200 This gene encodes a member of the fibrillin family of proteins. The encoded preproprotein is proteolytically processed to generate two proteins including the extracellular matrix component fibrillin-1 and the protein hormone asprosin. Fibrillin-1 is an extracellular matrix glycoprotein that serves as a structural component of calcium-binding microfibrils. These microfibrils provide force-bearing structural support in elastic and nonelastic connective tissue throughout the body. Asprosin, secreted by white adipose tissue, has been shown to regulate glucose homeostasis. Mutations in this gene are associated with Marfan syndrome and the related MASS phenotype, as well as ectopia lentis syndrome, Weill-Marchesani syndrome, Shprintzen-Goldberg syndrome and neonatal progeroid syndrome. ENSG00000166147 fibrillin 1
AHNAK 79026 NA ENSG00000124942 AHNAK nucleoprotein
KRT2 3849 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000172867 keratin 2
MYOZ1 58529 The protein encoded by this gene is primarily expressed in the skeletal muscle, and belongs to the myozenin family. Members of this family function as calcineurin-interacting proteins that help tether calcineurin to the sarcomere of cardiac and skeletal muscle. They play an important role in modulation of calcineurin signaling. ENSG00000177791 myozenin 1
SAA2-SAA4 100528017 This locus represents naturally occurring read-through transcription between the neighboring serum amyloid A2 and serum amyloid A4 genes on chromosome 11. The read-through transcript produces a fusion protein that shares sequence identity with each individual gene product. ENSG00000255071 SAA2-SAA4 readthrough
RASD1 51655 This gene encodes a member of the Ras superfamily of small GTPases and is induced by dexamethasone. The encoded protein is an activator of G-protein signaling and acts as a direct nucleotide exchange factor for Gi-Go proteins. This protein interacts with the neuronal nitric oxide adaptor protein CAPON, and a nuclear adaptor protein FE65, which interacts with the Alzheimer’s disease amyloid precursor protein. This gene may play a role in dexamethasone-induced alterations in cell morphology, growth and cell-extracellular matrix interactions. Epigenetic inactivation of this gene is closely correlated with resistance to dexamethasone in multiple myeloma cells. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000108551 ras related dexamethasone induced 1
ENO1 2023 This gene encodes alpha-enolase, one of three enolase isoenzymes found in mammals. Each isoenzyme is a homodimer composed of 2 alpha, 2 gamma, or 2 beta subunits, and functions as a glycolytic enzyme. Alpha-enolase in addition, functions as a structural lens protein (tau-crystallin) in the monomeric form. Alternative splicing of this gene results in a shorter isoform that has been shown to bind to the c-myc promoter and function as a tumor suppressor. Several pseudogenes have been identified, including one on the long arm of chromosome 1. Alpha-enolase has also been identified as an autoantigen in Hashimoto encephalopathy. ENSG00000074800 enolase 1
ANXA1 301 This gene encodes a membrane-localized protein that binds phospholipids. This protein inhibits phospholipase A2 and has anti-inflammatory activity. Loss of function or expression of this gene has been detected in multiple tumors. ENSG00000135046 annexin A1
YWHAZ 7534 This gene product belongs to the 14-3-3 family of proteins which mediate signal transduction by binding to phosphoserine-containing proteins. This highly conserved protein family is found in both plants and mammals, and this protein is 99% identical to the mouse, rat and sheep orthologs. The encoded protein interacts with IRS1 protein, suggesting a role in regulating insulin sensitivity. Several transcript variants that differ in the 5’ UTR but that encode the same protein have been identified for this gene. ENSG00000164924 tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein zeta
TG 7038 Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. ENSG00000042832 thyroglobulin
DAB2 1601 This gene encodes a mitogen-responsive phosphoprotein. It is expressed in normal ovarian epithelial cells, but is down-regulated or absent from ovarian carcinoma cell lines, suggesting its role as a tumor suppressor. This protein binds to the SH3 domains of GRB2, an adaptor protein that couples tyrosine kinase receptors to SOS (a guanine nucleotide exchange factor for Ras), via its C-terminal proline-rich sequences, and may thus modulate growth factor/Ras pathways by competing with SOS for binding to GRB2. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000153071 DAB2, clathrin adaptor protein
CSDE1 7812 NA ENSG00000009307 cold shock domain containing E1
TTC9 23508 This gene encodes a protein that contains three tetratricopeptide repeats. The gene has been shown to be hormonally regulated in breast cancer cells and may play a role in cancer cell invasion and metastasis. ENSG00000133985 tetratricopeptide repeat domain 9
ATP2A1 487 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol to the sarcoplasmic reticulum lumen, and is involved in muscular excitation and contraction. Mutations in this gene cause some autosomal recessive forms of Brody disease, characterized by increasing impairment of muscular relaxation during exercise. Alternative splicing results in three transcript variants encoding different isoforms. ENSG00000196296 ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 1
TLE2 7089 NA ENSG00000065717 transducin like enhancer of split 2
HBA1 3039 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. ENSG00000206172 hemoglobin subunit alpha 1
ENG 2022 This gene encodes a homodimeric transmembrane protein which is a major glycoprotein of the vascular endothelium. This protein is a component of the transforming growth factor beta receptor complex and it binds to the beta1 and beta3 peptides with high affinity. Mutations in this gene cause hereditary hemorrhagic telangiectasia, also known as Osler-Rendu-Weber syndrome 1, an autosomal dominant multisystemic vascular dysplasia. This gene may also be involved in preeclampsia and several types of cancer. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000106991 endoglin
GSTP1 2950 Glutathione S-transferases (GSTs) are a family of enzymes that play an important role in detoxification by catalyzing the conjugation of many hydrophobic and electrophilic compounds with reduced glutathione. Based on their biochemical, immunologic, and structural properties, the soluble GSTs are categorized into 4 main classes: alpha, mu, pi, and theta. This GST family member is a polymorphic gene encoding active, functionally different GSTP1 variant proteins that are thought to function in xenobiotic metabolism and play a role in susceptibility to cancer, and other diseases. ENSG00000084207 glutathione S-transferase pi 1
ENTPD1 953 The protein encoded by this gene is a plasma membrane protein that hydrolyzes extracellular ATP and ADP to AMP. Inhibition of this protein’s activity may confer anticancer benefits. Several transcript variants encoding different isoforms have been found for this gene. ENSG00000138185 ectonucleoside triphosphate diphosphohydrolase 1
STAG3L5P-PVRIG2P-PILRB 101752399 This locus represents naturally occurring readthrough transcription among the neighboring LOC101735302 (stromal antigen 3 pseudogene), LOC101752334 (poliovirus receptor related immunoglobulin domain containing pseudogene) and PILRB (paired immunoglobin-like type 2 receptor beta) genes on chromosome 7. The readthrough transcript is a candidate for nonsense-mediated mRNA decay (NMD), and is unlikely to produce a protein product. ENSG00000272752 STAG3L5P-PVRIG2P-PILRB readthrough
GPX3 2878 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. ENSG00000211445 glutathione peroxidase 3
MX1 4599 This gene encodes a guanosine triphosphate (GTP)-metabolizing protein that participates in the cellular antiviral response. The encoded protein is induced by type I and type II interferons and antagonizes the replication process of several different RNA and DNA viruses. There is a related gene located adjacent to this gene on chromosome 21, and there are multiple pseudogenes located in a cluster on chromosome 4. Alternative splicing results in multiple transcript variants. ENSG00000157601 MX dynamin like GTPase 1
FURIN 5045 This gene encodes a member of the subtilisin-like proprotein convertase family, which includes proteases that process protein and peptide precursors trafficking through regulated or constitutive branches of the secretory pathway. It encodes a type 1 membrane bound protease that is expressed in many tissues, including neuroendocrine, liver, gut, and brain. The encoded protein undergoes an initial autocatalytic processing event in the ER and then sorts to the trans-Golgi network through endosomes where a second autocatalytic event takes place and the catalytic activity is acquired. The product of this gene is one of the seven basic amino acid-specific members which cleave their substrates at single or paired basic residues. Some of its substrates include proparathyroid hormone, transforming growth factor beta 1 precursor, proalbumin, pro-beta-secretase, membrane type-1 matrix metalloproteinase, beta subunit of pro-nerve growth factor and von Willebrand factor. It is also thought to be one of the proteases responsible for the activation of HIV envelope glycoproteins gp160 and gp140 and may play a role in tumor progression. This gene is located in close proximity to family member proprotein convertase subtilisin/kexin type 6 and upstream of the FES oncogene. Alternative splicing results in multiple transcript variants. ENSG00000140564 furin, paired basic amino acid cleaving enzyme
TNNC2 7125 Troponin (Tn), a key protein complex in the regulation of striated muscle contraction, is composed of 3 subunits. The Tn-I subunit inhibits actomyosin ATPase, the Tn-T subunit binds tropomyosin and Tn-C, while the Tn-C subunit binds calcium and overcomes the inhibitory action of the troponin complex on actin filaments. The protein encoded by this gene is the Tn-C subunit. ENSG00000101470 troponin C2, fast skeletal type
MYL1 4632 Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain expressed in fast skeletal muscle. Two transcript variants have been identified for this gene. ENSG00000168530 myosin light chain 1
PDLIM3 27295 The protein encoded by this gene contains a PDZ domain and a LIM domain, indicating that it may be involved in cytoskeletal assembly. In support of this, the encoded protein has been shown to bind the spectrin-like repeats of alpha-actinin-2 and to colocalize with alpha-actinin-2 at the Z lines of skeletal muscle. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. Aberrant alternative splicing of this gene may play a role in myotonic dystrophy. ENSG00000154553 PDZ and LIM domain 3
DCXR 51181 The protein encoded by this gene acts as a homotetramer to catalyze diacetyl reductase and L-xylulose reductase reactions. The encoded protein may play a role in the uronate cycle of glucose metabolism and in the cellular osmoregulation in the proximal renal tubules. Defects in this gene are a cause of pentosuria. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000169738 dicarbonyl/L-xylulose reductase
PBXIP1 57326 The protein encoded by this gene interacts with the PBX1 homeodomain protein, inhibiting its transcriptional activation potential by preventing its binding to DNA. The encoded protein, which is primarily cytosolic but can shuttle to the nucleus, also can interact with estrogen receptors alpha and beta and promote the proliferation of breast cancer, brain tumors, and lung cancer. Several transcript variants encoding different isoforms have been found for this gene. More variants exist, but their full-length natures have yet to be determined. ENSG00000163346 PBX homeobox interacting protein 1
CEBPA 1050 This intronless gene encodes a transcription factor that contains a basic leucine zipper (bZIP) domain and recognizes the CCAAT motif in the promoters of target genes. The encoded protein functions in homodimers and also heterodimers with CCAAT/enhancer-binding proteins beta and gamma. Activity of this protein can modulate the expression of genes involved in cell cycle regulation as well as in body weight homeostasis. Mutation of this gene is associated with acute myeloid leukemia. The use of alternative in-frame non-AUG (GUG) and AUG start codons results in protein isoforms with different lengths. Differential translation initiation is mediated by an out-of-frame, upstream open reading frame which is located between the GUG and the first AUG start codons. ENSG00000245848 CCAAT/enhancer binding protein alpha
LOC100129518 100129518 NA ENSG00000112096 uncharacterized LOC100129518
SOD2 6648 This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 1. ENSG00000112096 superoxide dismutase 2, mitochondrial
CRYAB 1410 Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. ENSG00000109846 crystallin alpha B
FASN 2194 The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. ENSG00000169710 fatty acid synthase
CXCL2 2920 This antimicrobial gene is part of a chemokine superfamily that encodes secreted proteins involved in immunoregulatory and inflammatory processes. The superfamily is divided into four subfamilies based on the arrangement of the N-terminal cysteine residues of the mature peptide. This chemokine, a member of the CXC subfamily, is expressed at sites of inflammation and may suppress hematopoietic progenitor cell proliferation. ENSG00000081041 C-X-C motif chemokine ligand 2
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",4,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 5 Annotations

out <- mygene::queryMany(gene_list[5,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary X_id symbol name query notfound
This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. It has both 17alpha-hydroxylase and 17,20-lyase activities and is a key enzyme in the steroidogenic pathway that produces progestins, mineralocorticoids, glucocorticoids, androgens, and estrogens. Mutations in this gene are associated with isolated steroid-17 alpha-hydroxylase deficiency, 17-alpha-hydroxylase/17,20-lyase deficiency, pseudohermaphroditism, and adrenal hyperplasia. 1586 CYP17A1 cytochrome P450 family 17 subfamily A member 1 ENSG00000148795 NA
This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the mitochondrial inner membrane and is involved in the conversion of progesterone to cortisol in the adrenal cortex. Mutations in this gene cause congenital adrenal hyperplasia due to 11-beta-hydroxylase deficiency. Transcript variants encoding different isoforms have been noted for this gene. 1584 CYP11B1 cytochrome P450 family 11 subfamily B member 1 ENSG00000160882 NA
NA ENSG00000211895 IGHA1 immunoglobulin heavy constant alpha 1 ENSG00000211895 NA
The protein encoded by this gene plays a key role in the acute regulation of steroid hormone synthesis by enhancing the conversion of cholesterol into pregnenolone. This protein permits the cleavage of cholesterol into pregnenolone by mediating the transport of cholesterol from the outer mitochondrial membrane to the inner mitochondrial membrane. Mutations in this gene are a cause of congenital lipoid adrenal hyperplasia (CLAH), also called lipoid CAH. A pseudogene of this gene is located on chromosome 13. 6770 STAR steroidogenic acute regulatory protein ENSG00000147465 NA
Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. 72 ACTG2 actin, gamma 2, smooth muscle, enteric ENSG00000163017 NA
The protein encoded by this gene is a major apoprotein of the chylomicron. It binds to a specific liver and peripheral cell receptor, and is essential for the normal catabolism of triglyceride-rich lipoprotein constituents. This gene maps to chromosome 19 in a cluster with the related apolipoprotein C1 and C2 genes. Mutations in this gene result in familial dysbetalipoproteinemia, or type III hyperlipoproteinemia (HLP III), in which increased plasma cholesterol and triglycerides are the consequence of impaired clearance of chylomicron and VLDL remnants. Alternative splicing results in multiple transcript variants. 348 APOE apolipoprotein E ENSG00000130203 NA
This gene encodes a flavin adenine dinucleotide (FAD)-dependent oxidoreductase which catalyzes the reduction of the delta-24 double bond of sterol intermediates during cholesterol biosynthesis. The protein contains a leader sequence that directs it to the endoplasmic reticulum membrane. Missense mutations in this gene have been associated with desmosterolosis. Also, reduced expression of the gene occurs in the temporal cortex of Alzheimer disease patients and overexpression has been observed in adrenal gland cancer cells. 1718 DHCR24 24-dehydrocholesterol reductase ENSG00000116133 NA
Protamines substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis, and are the major DNA-binding proteins in the nucleus of sperm in many vertebrates. They package the sperm DNA into a highly condensed complex in a volume less than 5% of a somatic cell nucleus. Many mammalian species have only one protamine (protamine 1); however, a few species, including human and mouse, have two. This gene encodes protamine 2, which is cleaved to give rise to a family of protamine 2 peptides. Alternatively spliced transcript variants have also been found for this gene. 5620 PRM2 protamine 2 ENSG00000122304 NA
This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the mitochondrial inner membrane and catalyzes the conversion of cholesterol to pregnenolone, the first and rate-limiting step in the synthesis of the steroid hormones. Two transcript variants encoding different isoforms have been found for this gene. The cellular location of the smaller isoform is unclear since it lacks the mitochondrial-targeting transit peptide. 1583 CYP11A1 cytochrome P450 family 11 subfamily A member 1 ENSG00000140459 NA
This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. 7168 TPM1 tropomyosin 1 (alpha) ENSG00000140416 NA
The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. 2355 FOSL2 FOS like 2, AP-1 transcription factor subunit ENSG00000075426 NA
This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. 60 ACTB actin, beta ENSG00000075624 NA
The protein encoded by this gene is a plasma membrane receptor for high density lipoprotein cholesterol (HDL). The encoded protein mediates cholesterol transfer to and from HDL. In addition, this protein is a receptor for hepatitis C virus glycoprotein E2. Two transcript variants encoding different isoforms have been found for this gene. 949 SCARB1 scavenger receptor class B member 1 ENSG00000073060 NA
The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. Several transcript variants encoding different isoforms have been found for this gene. 2180 ACSL1 acyl-CoA synthetase long-chain family member 1 ENSG00000151726 NA
This gene represents a ubiquitin gene, ubiquitin C. The encoded protein is a polyubiquitin precursor. Conjugation of ubiquitin monomers or polymers can lead to various effects within a cell, depending on the residues to which ubiquitin is conjugated. Ubiquitination has been associated with protein degradation, DNA repair, cell cycle regulation, kinase modification, endocytosis, and regulation of other cell signaling pathways. 7316 UBC ubiquitin C ENSG00000150991 NA
The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in brain as well as in other tissues, and as a heterodimer with a similar muscle isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. A pseudogene of this gene has been characterized. 1152 CKB creatine kinase B ENSG00000166165 NA
This gene encodes a preproprotein, which is processed to yield both alpha and beta chains, which subsequently combine as a tetramer to produce haptoglobin. Haptoglobin functions to bind free plasma hemoglobin, which allows degradative enzymes to gain access to the hemoglobin, while at the same time preventing loss of iron through the kidneys and protecting the kidneys from damage by hemoglobin. Mutations in this gene and/or its regulatory regions cause ahaptoglobinemia or hypohaptoglobinemia. This gene has also been linked to diabetic nephropathy, the incidence of coronary artery disease in type 1 diabetes, Crohn’s disease, inflammatory disease behavior, primary sclerosing cholangitis, susceptibility to idiopathic Parkinson’s disease, and a reduced incidence of Plasmodium falciparum malaria. The protein encoded also exhibits antimicrobial activity against bacteria. A similar duplicated gene is located next to this gene on chromosome 16. Multiple transcript variants encoding different isoforms have been found for this gene. 3240 HP haptoglobin ENSG00000257017 NA
The expression of DUSP1 gene is induced in human skin fibroblasts by oxidative/heat stress and growth factors. It specifies a protein with structural features similar to members of the non-receptor-type protein-tyrosine phosphatase family, and which has significant amino-acid sequence similarity to a Tyr/Ser-protein phosphatase encoded by the late gene H1 of vaccinia virus. The bacterially expressed and purified DUSP1 protein has intrinsic phosphatase activity, and specifically inactivates mitogen-activated protein (MAP) kinase in vitro by the concomitant dephosphorylation of both its phosphothreonine and phosphotyrosine residues. Furthermore, it suppresses the activation of MAP kinase by oncogenic ras in extracts of Xenopus oocytes. Thus, DUSP1 may play an important role in the human cellular response to environmental stress as well as in the negative regulation of cellular proliferation. 1843 DUSP1 dual specificity phosphatase 1 ENSG00000120129 NA
Aldehyde oxidase produces hydrogen peroxide and, under certain conditions, can catalyze the formation of superoxide. Aldehyde oxidase is a candidate gene for amyotrophic lateral sclerosis. 316 AOX1 aldehyde oxidase 1 ENSG00000138356 NA
NA 5619 PRM1 protamine 1 ENSG00000175646 NA
The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. 6876 TAGLN transgelin ENSG00000149591 NA
The protein encoded by this gene is a component of desmosomes and of the epidermal cornified envelope in keratinocytes. The N-terminal domain of this protein interacts with the plasma membrane and its C-terminus interacts with intermediate filaments. Through its rod domain, this protein forms complexes with envoplakin. This protein may serve as a link between the cornified envelope and desmosomes as well as intermediate filaments. AKT1/PKB, a protein kinase mediating a variety of cell growth and survival signaling processes, is reported to interact with this protein, suggesting a possible role for this protein as a localization signal in AKT1-mediated signaling. 5493 PPL periplakin ENSG00000118898 NA
This gene encodes a member of the profilin family of small actin-binding proteins. The encoded protein plays an important role in actin dynamics by regulating actin polymerization in response to extracellular signals. Deletion of this gene is associated with Miller-Dieker syndrome, and the encoded protein may also play a role in Huntington disease. Multiple pseudogenes of this gene are located on chromosome 1. 5216 PFN1 profilin 1 ENSG00000108518 NA
This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein involved in stress responses, hormone responses, cell growth, and differentiation. The encoded protein is necessary for p53-mediated caspase activation and apoptosis. Mutations in this gene are a cause of Charcot-Marie-Tooth disease type 4D, and expression of this gene may be a prognostic indicator for several types of cancer. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 10397 NDRG1 N-myc downstream regulated 1 ENSG00000104419 NA
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the basal layer of the epidermis with family member KRT14. Mutations in these genes have been associated with a complex of diseases termed epidermolysis bullosa simplex. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3852 KRT5 keratin 5 ENSG00000186081 NA
Epoxide hydrolase is a critical biotransformation enzyme that converts epoxides from the degradation of aromatic compounds to trans-dihydrodiols which can be conjugated and excreted from the body. Epoxide hydrolase functions in both the activation and detoxification of epoxides. Mutations in this gene cause preeclampsia, epoxide hydrolase deficiency or increased epoxide hydrolase activity. Alternatively spliced transcript variants encoding the same protein have been found for this gene. 2052 EPHX1 epoxide hydrolase 1 ENSG00000143819 NA
NA ENSG00000211899 IGHM immunoglobulin heavy constant mu ENSG00000211899 NA
This gene encodes a multidomain serine protease inhibitor that contains 15 potential inhibitory domains. The encoded preproprotein is proteolytically processed to generate multiple protein products, which may exhibit unique activities and specificities. These proteins may play a role in skin and hair morphogenesis, as well as anti-inflammatory and antimicrobial protection of mucous epithelia. Mutations in this gene may result in Netherton syndrome, a disorder characterized by ichthyosis, defective cornification, and atopy. This gene is present in a gene cluster on chromosome 5. Alternative splicing results in multiple transcript variants. 11005 SPINK5 serine peptidase inhibitor, Kazal type 5 ENSG00000133710 NA
The protein encoded by this gene is a cell membrane protein that may be involved in iron export from duodenal epithelial cells. Defects in this gene are a cause of hemochromatosis type 4 (HFE4). 30061 SLC40A1 solute carrier family 40 member 1 ENSG00000138449 NA
NA ENSG00000211890 IGHA2 immunoglobulin heavy constant alpha 2 (A2m marker) ENSG00000211890 NA
The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. In some cases, expression of the FOS gene has also been associated with apoptotic cell death. 2353 FOS Fos proto-oncogene, AP-1 transcription factor subunit ENSG00000170345 NA
Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. 4625 MYH7 myosin, heavy chain 7, cardiac muscle, beta ENSG00000092054 NA
The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains and are clustered in a region on chromosome 17q21.2. 3866 KRT15 keratin 15 ENSG00000171346 NA
Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Many of the effects of laminin are mediated through interactions with cell surface receptors. These receptors include members of the integrin family, as well as non-integrin laminin-binding proteins. This gene encodes a high-affinity, non-integrin family, laminin receptor 1. This receptor has been variously called 67 kD laminin receptor, 37 kD laminin receptor precursor (37LRP) and p40 ribosome-associated protein. The amino acid sequence of laminin receptor 1 is highly conserved through evolution, suggesting a key biological function. It has been observed that the level of the laminin receptor transcript is higher in colon carcinoma tissue and lung cancer cell line than their normal counterparts. Also, there is a correlation between the upregulation of this polypeptide in cancer cells and their invasive and metastatic phenotype. Multiple copies of this gene exist, however, most of them are pseudogenes thought to have arisen from retropositional events. Two alternatively spliced transcript variants encoding the same protein have been found for this gene. 3921 RPSA ribosomal protein SA ENSG00000168028 NA
This gene is a member of the immunoglobulin superfamily. The encoded poly-Ig receptor binds polymeric immunoglobulin molecules at the basolateral surface of epithelial cells; the complex is then transported across the cell to be secreted at the apical surface. A significant association was found between immunoglobulin A nephropathy and several SNPs in this gene. 5284 PIGR polymeric immunoglobulin receptor ENSG00000162896 NA
This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. 1674 DES desmin ENSG00000175084 NA
The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and as a cytokine. Altered expression of this protein is associated with the disease cystic fibrosis. Multiple transcript variants encoding different isoforms have been found for this gene. 6279 S100A8 S100 calcium binding protein A8 ENSG00000143546 NA
This gene encodes a glycoprotein involved in the regulation of the complement cascade. Binding of the encoded protein to complement proteins accelerates their decay, thereby disrupting the cascade and preventing damage to host cells. Antigens present on this protein constitute the Cromer blood group system (CROM). Alternative splicing results in multiple transcript variants. The predominant transcript variant encodes a membrane-bound protein, but alternatively spliced transcripts may produce soluble proteins. 1604 CD55 CD55 molecule (Cromer blood group) ENSG00000196352 NA
The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. 4629 MYH11 myosin, heavy chain 11, smooth muscle ENSG00000133392 NA
This gene encodes a transmembrane protein that contains multiple epidermal growth factor repeats that functions as a regulator of cell growth. The encoded protein is involved in the differentiation of several cell types including adipocytes. This gene is located in a region of chromosome 14 frequently showing unparental disomy, and is imprinted and expressed from the paternal allele. A single nucleotide variant in this gene is associated with child and adolescent obesity and shows polar overdominance, where heterozygotes carrying an active paternal allele express the phenotype, while mutant homozygotes are normal. 8788 DLK1 delta like non-canonical Notch ligand 1 ENSG00000185559 NA
The protein encoded by this gene is a glycogen phosphorylase found predominantly in the brain. The encoded protein forms homodimers which can associate into homotetramers, the enzymatically active form of glycogen phosphorylase. The activity of this enzyme is positively regulated by AMP and negatively regulated by ATP, ADP, and glucose-6-phosphate. This enzyme catalyzes the rate-determining step in glycogen degradation. 5834 PYGB phosphorylase, glycogen; brain ENSG00000100994 NA
This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. 1634 DCN decorin ENSG00000011465 NA
This gene is a member of the TIS11 family of early response genes, which are induced by various agonists such as the phorbol ester TPA and the polypeptide mitogen EGF. This gene is well conserved across species and has a promoter that contains motifs seen in other early-response genes. The encoded protein contains a distinguishing putative zinc finger domain with a repeating cys-his motif. This putative nuclear transcription factor most likely functions in regulating the response to growth factors. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 677 ZFP36L1 ZFP36 ring finger protein-like 1 ENSG00000185650 NA
This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and hydroxylates steroids at the 21 position. Its activity is required for the synthesis of steroid hormones including cortisol and aldosterone. Mutations in this gene cause congenital adrenal hyperplasia. A related pseudogene is located near this gene; gene conversion events involving the functional gene and the pseudogene are thought to account for many cases of steroid 21-hydroxylase deficiency. Two transcript variants encoding different isoforms have been found for this gene. 1589 CYP21A2 cytochrome P450 family 21 subfamily A member 2 ENSG00000231852 NA
Spermatogenesis is a complex process regulated by extracellular and intracellular factors as well as cellular interactions among interstitial cells of the testis, Sertoli cells, and germ cells. This gene is expressed in the testis in Sertoli cells but not germ cells. The protein encoded by this gene contains plant homeodomain (PHD) finger domains, also known as leukemia associated protein (LAP) domains, believed to be involved in transcriptional regulation. The protein, which localizes to the nucleus of transfected cells, has been implicated in the transcriptional regulation of spermatogenesis. Alternate splicing results in multiple transcript variants of this gene. 51533 PHF7 PHD finger protein 7 ENSG00000010318 NA
NA 57515 SERINC1 serine incorporator 1 ENSG00000111897 NA
This gene encodes a member of the aldo/keto reductase superfamily, which consists of more than 40 known enzymes and proteins. This member catalyzes the reduction of a number of aldehydes, including the aldehyde form of glucose, and is thereby implicated in the development of diabetic complications by catalyzing the reduction of glucose to sorbitol. Multiple pseudogenes have been identified for this gene. The nomenclature system used by the HUGO Gene Nomenclature Committee to define human aldo-keto reductase family members is known to differ from that used by the Mouse Genome Informatics database. 231 AKR1B1 aldo-keto reductase family 1 member B ENSG00000085662 NA
NA 55074 OXR1 oxidation resistance 1 ENSG00000164830 NA
NA 84669 USP32 ubiquitin specific peptidase 32 ENSG00000170832 NA
This gene encodes a member of the Notch family. Members of this Type 1 transmembrane protein family share structural characteristics including an extracellular domain consisting of multiple epidermal growth factor-like (EGF) repeats, and an intracellular domain consisting of multiple, different domain types. Notch family members play a role in a variety of developmental processes by controlling cell fate decisions. The Notch signaling network is an evolutionarily conserved intercellular signaling pathway which regulates interactions between physically adjacent cells. In Drosophilia, notch interaction with its cell-bound ligands (delta, serrate) establishes an intercellular signaling pathway that plays a key role in development. Homologues of the notch-ligands have also been identified in human, but precise interactions between these ligands and the human notch homologues remain to be determined. This protein is cleaved in the trans-Golgi network, and presented on the cell surface as a heterodimer. This protein functions as a receptor for membrane bound ligands, and may play a role in vascular, renal and hepatic development. Two transcript variants encoding different isoforms have been found for this gene. 4853 NOTCH2 notch 2 ENSG00000134250 NA
NA 26959 HBP1 HMG-box transcription factor 1 ENSG00000105856 NA
NA 3488 IGFBP5 insulin like growth factor binding protein 5 ENSG00000115461 NA
This gene encodes a serine protease, which is a major constituent of the human complement subcomponent C1. C1s associates with two other complement components C1r and C1q in order to yield the first component of the serum complement system. Defects in this gene are the cause of selective C1s deficiency. 716 C1S complement component 1, s subcomponent ENSG00000182326 NA
NA 7763 ZFAND5 zinc finger AN1-type containing 5 ENSG00000107372 NA
Radixin is a cytoskeletal protein that may be important in linking actin to the plasma membrane. It is highly similar in sequence to both ezrin and moesin. The radixin gene has been localized by fluorescence in situ hybridization to 11q23. A truncated version representing a pseudogene (RDXP2) was assigned to Xp21.3. Another pseudogene that seemed to lack introns (RDXP1) was mapped to 11p by Southern and PCR analyses. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. 5962 RDX radixin ENSG00000137710 NA
Synaptic vesicles are responsible for regulating the storage and release of neurotransmitters in the nerve terminal. The protein encoded by this gene is an abundant integral membrane protein of cholinergic synaptic vesicles and is thought to be involved in vesicular transport. It belongs to the quinone oxidoreductase subfamily of zinc-containing alcohol dehydrogenase proteins. 10493 VAT1 vesicle amine transport 1 ENSG00000108828 NA
The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. 125 ADH1B alcohol dehydrogenase 1B (class I), beta polypeptide ENSG00000196616 NA
NA 51313 FAM198B family with sequence similarity 198 member B ENSG00000164125 NA
NA 9467 SH3BP5 SH3 domain binding protein 5 ENSG00000131370 NA
This gene encodes a member of the peptidyl-prolyl cis-trans isomerase (PPIase) family. PPIases catalyze the cis-trans isomerization of proline imidic peptide bonds in oligopeptides and accelerate the folding of proteins. The encoded protein is a cyclosporin binding-protein and may play a role in cyclosporin A-mediated immunosuppression. The protein can also interact with several HIV proteins, including p55 gag, Vpr, and capsid protein, and has been shown to be necessary for the formation of infectious HIV virions. Multiple pseudogenes that map to different chromosomes have been reported. 5478 PPIA peptidylprolyl isomerase A ENSG00000196262 NA
The protein encoded by this gene is a transcriptional regulator that binds as a homodimer to activating transcription factor (ATF) sites in many cellular and viral promoters. The encoded protein represses PER1 and PER2 expression and therefore plays a role in the regulation of circadian rhythm. Three transcript variants encoding the same protein have been found for this gene. 4783 NFIL3 nuclear factor, interleukin 3 regulated ENSG00000165030 NA
This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol to the sarcoplasmic reticulum lumen, and is involved in calcium sequestration associated with muscular excitation and contraction. Alternative splicing results in multiple transcript variants encoding different isoforms. 489 ATP2A3 ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 3 ENSG00000074370 NA
This gene encodes a member of the cysteine-rich protein (CSRP) family. This gene family includes a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this gene product occurs in proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Alternatively spliced transcript variants have been described. 1465 CSRP1 cysteine and glycine rich protein 1 ENSG00000159176 NA
This gene encodes a lysine-specific histone demethylase that belongs to the jumonji/ARID domain-containing family of histone demethylases. The encoded protein is capable of demethylating tri-, di- and monomethylated lysine 4 of histone H3. This protein plays a role in the transcriptional repression or certain tumor suppressor genes and is upregulated in certain cancer cells. This protein may also play a role in genome stability and DNA repair. Alternate splicing resultsi n multiple transcript variants. 10765 KDM5B lysine demethylase 5B ENSG00000117139 NA
NA 57561 ARRDC3 arrestin domain containing 3 ENSG00000113369 NA
Cytochrome c oxidase (COX), the terminal component of the mitochondrial respiratory chain, catalyzes the electron transfer from reduced cytochrome c to oxygen. This component is a heteromeric complex consisting of 3 catalytic subunits encoded by mitochondrial genes and multiple structural subunits encoded by nuclear genes. The mitochondrially-encoded subunits function in electron transfer, and the nuclear-encoded subunits may function in the regulation and assembly of the complex. This nuclear gene encodes subunit VIIc, which shares 87% and 85% amino acid sequence identity with mouse and bovine COX VIIc, respectively, and is found in all tissues. A pseudogene COX7CP1 has been found on chromosome 13. 1350 COX7C cytochrome c oxidase subunit 7C ENSG00000127184 NA
This gene encodes an integral membrane protein containing four transmembrane regions and a C-terminal cytoplasmic tail that is tyrosine phosphorylated. The exact function of this protein is unclear, but studies of a similar rat protein suggest that it may play a role in regulating membrane traffic in non-neuronal cells. The gene belongs to the synaptogyrin gene family. Alternative splicing results in multiple transcript variants. 9144 SYNGR2 synaptogyrin 2 ENSG00000108639 NA
NA 6515 SLC2A3 solute carrier family 2 member 3 ENSG00000059804 NA
NA 84898 PLXDC2 plexin domain containing 2 ENSG00000120594 NA
This gene encodes a transcription factor that binds to the sterol regulatory element-1 (SRE1), which is a decamer flanking the low density lipoprotein receptor gene and some genes involved in sterol biosynthesis. The protein is synthesized as a precursor that is attached to the nuclear membrane and endoplasmic reticulum. Following cleavage, the mature protein translocates to the nucleus and activates transcription by binding to the SRE1. Sterols inhibit the cleavage of the precursor, and the mature nuclear form is rapidly catabolized, thereby reducing transcription. The protein is a member of the basic helix-loop-helix-leucine zipper (bHLH-Zip) transcription factor family. This gene is located within the Smith-Magenis syndrome region on chromosome 17. 6720 SREBF1 sterol regulatory element binding transcription factor 1 ENSG00000072310 NA
This gene encodes a member of the Kruppel-like family of transcription factors. The zinc finger protein is a transcriptional activator, and functions as a tumor suppressor. Multiple transcript variants encoding different isoforms have been found for this gene, some of which are implicated in carcinogenesis. 1316 KLF6 Kruppel like factor 6 ENSG00000067082 NA
NA 58191 CXCL16 C-X-C motif chemokine ligand 16 ENSG00000161921 NA
Cytochrome c oxidase (COX) is the terminal enzyme of the mitochondrial respiratory chain. It is a multi-subunit enzyme complex that couples the transfer of electrons from cytochrome c to molecular oxygen and contributes to a proton electrochemical gradient across the inner mitochondrial membrane. The complex consists of 13 mitochondrial- and nuclear-encoded subunits. The mitochondrially-encoded subunits perform the electron transfer and proton pumping activities. The functions of the nuclear-encoded subunits are unknown but they may play a role in the regulation and assembly of the complex. This gene encodes the nuclear-encoded subunit IV isoform 1 of the human mitochondrial respiratory chain enzyme. It is located at the 3’ of the NOC4 (neighbor of COX4) gene in a head-to-head orientation, and shares a promoter with it. Pseudogenes related to this gene are located on chromosomes 13 and 14. Alternative splicing results in multiple transcript variants encoding different isoforms. 1327 COX4I1 cytochrome c oxidase subunit 4I1 ENSG00000131143 NA
Amino acid transporters play essential roles in the uptake of nutrients, production of energy, chemical metabolism, detoxification, and neurotransmitter cycling. SLC38A1 is an important transporter of glutamine, an intermediate in the detoxification of ammonia and the production of urea. Glutamine serves as a precursor for the synaptic transmitter, glutamate (Gu et al., 2001 [PubMed 11325958]). 81539 SLC38A1 solute carrier family 38 member 1 ENSG00000111371 NA
This gene encodes an enzyme involved in fatty acid biosynthesis, primarily the synthesis of oleic acid. The protein belongs to the fatty acid desaturase family and is an integral membrane protein located in the endoplasmic reticulum. Transcripts of approximately 3.9 and 5.2 kb, differing only by alternative polyadenlyation signals, have been detected. A gene encoding a similar enzyme is located on chromosome 4 and a pseudogene of this gene is located on chromosome 17. 6319 SCD stearoyl-CoA desaturase ENSG00000099194 NA
This gene encodes a membrane-bound protein that is a member of the mucin family. Mucins are O-glycosylated proteins that play an essential role in forming protective mucous barriers on epithelial surfaces. These proteins also play a role in intracellular signaling. This protein is expressed on the apical surface of epithelial cells that line the mucosal surfaces of many different tissues including lung, breast stomach and pancreas. This protein is proteolytically cleaved into alpha and beta subunits that form a heterodimeric complex. The N-terminal alpha subunit functions in cell-adhesion and the C-terminal beta subunit is involved in cell signaling. Overexpression, aberrant intracellular localization, and changes in glycosylation of this protein have been associated with carcinomas. This gene is known to contain a highly polymorphic variable number tandem repeats (VNTR) domain. Alternate splicing results in multiple transcript variants. 4582 MUC1 mucin 1, cell surface associated ENSG00000185499 NA
This gene encodes a member of the DnaJ or Hsp40 (heat shock protein 40 kD) family of proteins. DNAJ family members are characterized by a highly conserved amino acid stretch called the ‘J-domain’ and function as one of the two major classes of molecular chaperones involved in a wide range of cellular events, such as protein folding and oligomeric protein complex assembly. The encoded protein is a molecular chaperone that stimulates the ATPase activity of Hsp70 heat-shock proteins in order to promote protein folding and prevent misfolded protein aggregation. Alternative splicing results in multiple transcript variants. 3337 DNAJB1 DnaJ heat shock protein family (Hsp40) member B1 ENSG00000132002 NA
This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 7169 TPM2 tropomyosin 2 (beta) ENSG00000198467 NA
This gene belongs to the chemokine-like factor gene superfamily, a novel family that links the chemokine and the transmembrane 4 superfamilies of signaling molecules. The protein encoded by this gene may play an important role in testicular development. 146225 CMTM2 CKLF like MARVEL transmembrane domain containing 2 ENSG00000140932 NA
This gene encodes a member of the myotubularin dual specificity protein phosphatase gene family. The encoded protein is structurally similar to myotubularin but in addition contains a FYVE domain and an N-terminal PH-GRAM domain. The protein can self-associate and also form heteromers with another myotubularin related protein. The protein binds to phosphoinositide lipids through the PH-GRAM domain, and can hydrolyze phosphatidylinositol(3)-phosphate and phosphatidylinositol(3,5)-biphosphate in vitro. The encoded protein has been observed to have a perinuclear, possibly membrane-bound, distribution in cells, but it has also been found free in the cytoplasm. Multiple transcript variants encoding different isoforms have been found for this gene. 8897 MTMR3 myotubularin related protein 3 ENSG00000100330 NA
This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family that includes decorin, biglycan, fibromodulin, keratocan, epiphycan, and osteoglycin. In these bifunctional molecules, the protein moiety binds collagen fibrils and the highly charged hydrophilic glycosaminoglycans regulate interfibrillar spacings. Lumican is the major keratan sulfate proteoglycan of the cornea but is also distributed in interstitial collagenous matrices throughout the body. Lumican may regulate collagen fibril organization and circumferential growth, corneal transparency, and epithelial cell migration and tissue repair. 4060 LUM lumican ENSG00000139329 NA
NA 92840 REEP6 receptor accessory protein 6 ENSG00000115255 NA
Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L15P family of ribosomal proteins. It is located in the cytoplasm. Variable expression of this gene in colorectal cancers compared to adjacent normal tissues has been observed, although no correlation between the level of expression and the severity of the disease has been found. As is typical for genes encoding ribosomal proteins, multiple processed pseudogenes derived from this gene are dispersed through the genome. 6157 RPL27A ribosomal protein L27a ENSG00000166441 NA
The product of this gene is a membrane-associated protein that functions in clathrin-mediated endocytosis and protein trafficking within the cell. The encoded protein binds to the huntingtin protein in the brain; this interaction is lost in Huntington’s disease. Alternative splicing results in multiple transcript variants. 3092 HIP1 huntingtin interacting protein 1 ENSG00000127946 NA
This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. 2335 FN1 fibronectin 1 ENSG00000115414 NA
This gene encodes a member of the sestrin family of stress-induced proteins. The encoded protein reduces the levels of intracellular reactive oxygen species induced by activated Ras downstream of RAC-alpha serine/threonine-protein kinase (Akt) and FoxO transcription factor. The protein is required for normal regulation of blood glucose, insulin resistance and plays a role in lipid storage in obesity. Alternative splicing results in multiple transcript variants. 143686 SESN3 sestrin 3 ENSG00000149212 NA
NA 7091 TLE4 transducin like enhancer of split 4 ENSG00000106829 NA
The protein encoded by this gene is a member of the Zfh1 family of 2-handed zinc finger/homeodomain proteins. It is located in the nucleus and functions as a DNA-binding transcriptional repressor that interacts with activated SMADs. Mutations in this gene are associated with Hirschsprung disease/Mowat-Wilson syndrome. Alternatively spliced transcript variants have been found for this gene. 9839 ZEB2 zinc finger E-box binding homeobox 2 ENSG00000169554 NA
NA ENSG00000211675 IGLC1 immunoglobulin lambda constant 1 (Mcg marker) ENSG00000211675 NA
NA NA NA NA ENSG00000090920 TRUE
This gene encodes one of the immunoglobulin lambda-like polypeptides. It is located within the immunoglobulin lambda locus but it does not require somatic rearrangement for expression. The first exon of this gene is unrelated to immunoglobulin variable genes; the second and third exons are the immunoglobulin lambda joining 1 and the immunoglobulin lambda constant 1 gene segments. Alternative splicing results in multiple transcript variants. 100423062 IGLL5 immunoglobulin lambda like polypeptide 5 ENSG00000254709 NA
The protein encoded by this gene catalyzes the transport of phosphate into the mitochondrial matrix, either by proton cotransport or in exchange for hydroxyl ions. The protein contains three related segments arranged in tandem which are related to those found in other characterized members of the mitochondrial carrier family. Both the N-terminal and C-terminal regions of this protein protrude toward the cytosol. Multiple alternatively spliced transcript variants have been isolated. 5250 SLC25A3 solute carrier family 25 member 3 ENSG00000075415 NA
This gene encodes the mitochondrial enzyme which is catalyzes the rate-limiting step in heme (iron-protoporphyrin) biosynthesis. The enzyme encoded by this gene is the housekeeping enzyme; a separate gene encodes a form of the enzyme that is specific for erythroid tissue. The level of the mature encoded protein is regulated by heme: high levels of heme down-regulate the mature enzyme in mitochondria while low heme levels up-regulate. A pseudogene of this gene is located on chromosome 12. Alternative splicing results in multiple transcript variants encoding different isoforms. 211 ALAS1 5’-aminolevulinate synthase 1 ENSG00000023330 NA
This gene is a member of the Regulator of Complement Activation (RCA) gene cluster and encodes a protein with twenty short consensus repeat (SCR) domains. This protein is secreted into the bloodstream and has an essential role in the regulation of complement activation, restricting this innate defense mechanism to microbial infections. Mutations in this gene have been associated with hemolytic-uremic syndrome (HUS) and chronic hypocomplementemic nephropathy. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. 3075 CFH complement factor H ENSG00000000971 NA
The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. 5265 SERPINA1 serpin family A member 1 ENSG00000197249 NA
This gene is a member of the NADH dehydrogenase (ubiquinone) iron-sulfur protein family. The encoded protein is a subunit of the NADH:ubiquinone oxidoreductase (complex I), the first enzyme complex in the electron transport chain located in the inner mitochondrial membrane. Alternative splicing results in multiple transcript variants and pseudogenes have been identified on chromosomes 1, 4 and 17. 4725 NDUFS5 NADH:ubiquinone oxidoreductase subunit S5 ENSG00000168653 NA
This gene is located in an imprinted region of chromosome 11 near the insulin-like growth factor 2 (IGF2) gene. This gene is only expressed from the maternally-inherited chromosome, whereas IGF2 is only expressed from the paternally-inherited chromosome. The product of this gene is a long non-coding RNA which functions as a tumor suppressor. Mutations in this gene have been associated with Beckwith-Wiedemann Syndrome and Wilms tumorigenesis. Alternative splicing results in multiple transcript variants. 283120 H19 H19, imprinted maternally expressed transcript (non-protein coding) ENSG00000130600 NA
This gene encodes a basic helix-loop-helix protein expressed in various tissues. The encoded protein can interact with ARNTL or compete for E-box binding sites in the promoter of PER1 and repress CLOCK/ARNTL’s transactivation of PER1. This gene is believed to be involved in the control of circadian rhythm and cell differentiation. 8553 BHLHE40 basic helix-loop-helix family member e40 ENSG00000134107 NA
NA 8420 SNHG3 small nucleolar RNA host gene 3 ENSG00000242125 NA
This gene belongs to the family of reticulon encoding genes. Reticulons are associated with the endoplasmic reticulum, and are involved in neuroendocrine secretion or in membrane trafficking in neuroendocrine cells. The product of this gene is a potent neurite outgrowth inhibitor which may also help block the regeneration of the central nervous system in higher vertebrates. Alternatively spliced transcript variants derived both from differential splicing and differential promoter usage and encoding different isoforms have been identified. 57142 RTN4 reticulon 4 ENSG00000115310 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",5,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 6 Annotations

out <- mygene::queryMany(gene_list[6,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query symbol summary X_id name notfound
ENSG00000244734 HBB The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. 3043 hemoglobin subunit beta NA
ENSG00000168542 COL3A1 This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1281 collagen type III alpha 1 chain NA
ENSG00000167768 KRT1 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3848 keratin 1 NA
ENSG00000162896 PIGR This gene is a member of the immunoglobulin superfamily. The encoded poly-Ig receptor binds polymeric immunoglobulin molecules at the basolateral surface of epithelial cells; the complex is then transported across the cell to be secreted at the apical surface. A significant association was found between immunoglobulin A nephropathy and several SNPs in this gene. 5284 polymeric immunoglobulin receptor NA
ENSG00000159251 ACTC1 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). 70 actin, alpha, cardiac muscle 1 NA
ENSG00000170323 FABP4 FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. 2167 fatty acid binding protein 4 NA
ENSG00000059804 SLC2A3 NA 6515 solute carrier family 2 member 3 NA
ENSG00000139112 GABARAPL1 NA 23710 GABA type A receptor associated protein like 1 NA
ENSG00000079308 TNS1 The protein encoded by this gene localizes to focal adhesions, regions of the plasma membrane where the cell attaches to the extracellular matrix. This protein crosslinks actin filaments and contains a Src homology 2 (SH2) domain, which is often found in molecules involved in signal transduction. This protein is a substrate of calpain II. Alternative splicing results in multiple transcript variants encoding different isoforms. 7145 tensin 1 NA
ENSG00000110799 VWF This gene encodes a glycoprotein involved in hemostasis. The encoded preproprotein is proteolytically processed following assembly into large multimeric complexes. These complexes function in the adhesion of platelets to sites of vascular injury and the transport of various proteins in the blood. Mutations in this gene result in von Willebrand disease, an inherited bleeding disorder. An unprocessed pseudogene has been found on chromosome 22. 7450 von Willebrand factor NA
ENSG00000211890 IGHA2 NA ENSG00000211890 immunoglobulin heavy constant alpha 2 (A2m marker) NA
ENSG00000096384 HSP90AB1 This gene encodes a member of the heat shock protein 90 family; these proteins are involved in signal transduction, protein folding and degradation and morphological evolution. This gene encodes the constitutive form of the cytosolic 90 kDa heat-shock protein and is thought to play a role in gastric apoptosis and inflammation. Alternative splicing results in multiple transcript variants. Pseudogenes have been identified on multiple chromosomes. 3326 heat shock protein 90kDa alpha family class B member 1 NA
ENSG00000175084 DES This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. 1674 desmin NA
ENSG00000075624 ACTB This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. 60 actin, beta NA
ENSG00000183087 GAS6 This gene encodes a gamma-carboxyglutamic acid (Gla)-containing protein thought to be involved in the stimulation of cell proliferation. This gene is frequently overexpressed in many cancers and has been implicated as an adverse prognostic marker. Elevated protein levels are additionally associated with a variety of disease states, including venous thromboembolic disease, systemic lupus erythematosus, chronic renal failure, and preeclampsia. 2621 growth arrest specific 6 NA
ENSG00000134107 BHLHE40 This gene encodes a basic helix-loop-helix protein expressed in various tissues. The encoded protein can interact with ARNTL or compete for E-box binding sites in the promoter of PER1 and repress CLOCK/ARNTL’s transactivation of PER1. This gene is believed to be involved in the control of circadian rhythm and cell differentiation. 8553 basic helix-loop-helix family member e40 NA
ENSG00000204983 PRSS1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. 5644 protease, serine 1 NA
ENSG00000169710 FASN The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. 2194 fatty acid synthase NA
ENSG00000163431 LMOD1 The leiomodin 1 protein has a putative membrane-spanning region and 2 types of tandemly repeated blocks. The transcript is expressed in all tissues tested, with the highest levels in thyroid, eye muscle, skeletal muscle, and ovary. Increased expression of leiomodin 1 may be linked to Graves’ disease and thyroid-associated ophthalmopathy. 25802 leiomodin 1 NA
ENSG00000104879 CKM The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. 1158 creatine kinase, M-type NA
ENSG00000101187 SLCO4A1 NA 28231 solute carrier organic anion transporter family member 4A1 NA
ENSG00000144381 HSPD1 This gene encodes a member of the chaperonin family. The encoded mitochondrial protein may function as a signaling molecule in the innate immune system. This protein is essential for the folding and assembly of newly imported proteins in the mitochondria. This gene is adjacent to a related family member and the region between the 2 genes functions as a bidirectional promoter. Several pseudogenes have been associated with this gene. Two transcript variants encoding the same protein have been identified for this gene. Mutations associated with this gene cause autosomal recessive spastic paraplegia 13. 3329 heat shock protein family D (Hsp60) member 1 NA
ENSG00000196091 MYBPC1 This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 4604 myosin binding protein C, slow type NA
ENSG00000148677 ANKRD1 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. 27063 ankyrin repeat domain 1 NA
ENSG00000149591 TAGLN The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. 6876 transgelin NA
ENSG00000166825 ANPEP Aminopeptidase N is located in the small-intestinal and renal microvillar membrane, and also in other plasma membranes. In the small intestine aminopeptidase N plays a role in the final digestion of peptides generated from hydrolysis of proteins by gastric and pancreatic proteases. Its function in proximal tubular epithelial cells and other cell types is less clear. The large extracellular carboxyterminal domain contains a pentapeptide consensus sequence characteristic of members of the zinc-binding metalloproteinase superfamily. Sequence comparisons with known enzymes of this class showed that CD13 and aminopeptidase N are identical. The latter enzyme was thought to be involved in the metabolism of regulatory peptides by diverse cell types, including small intestinal and renal tubular epithelial cells, macrophages, granulocytes, and synaptic membranes from the CNS. Human aminopeptidase N is a receptor for one strain of human coronavirus that is an important cause of upper respiratory tract infections. Defects in this gene appear to be a cause of various types of leukemia or lymphoma. 290 alanyl aminopeptidase, membrane NA
ENSG00000091704 CPA1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. 1357 carboxypeptidase A1 NA
ENSG00000119508 NR4A3 This gene encodes a member of the steroid-thyroid hormone-retinoid receptor superfamily. The encoded protein may act as a transcriptional activator. The protein can efficiently bind the NGFI-B Response Element (NBRE). Three different versions of extraskeletal myxoid chondrosarcomas (EMCs) are the result of reciprocal translocations between this gene and other genes. The translocation breakpoints are associated with Nuclear Receptor Subfamily 4, Group A, Member 3 (on chromosome 9) and either Ewing Sarcome Breakpoint Region 1 (on chromosome 22), RNA Polymerase II, TATA Box-Binding Protein-Associated Factor, 68-KD (on chromosome 17), or Transcription factor 12 (on chromosome 15). Multiple transcript variants encoding different isoforms have been found for this gene. 8013 nuclear receptor subfamily 4 group A member 3 NA
ENSG00000143416 SELENBP1 This gene encodes a member of the selenium-binding protein family. Selenium is an essential nutrient that exhibits potent anticarcinogenic properties, and deficiency of selenium may cause certain neurologic diseases. The effects of selenium in preventing cancer and neurologic diseases may be mediated by selenium-binding proteins, and decreased expression of this gene may be associated with several types of cancer. The encoded protein may play a selenium-dependent role in ubiquitination/deubiquitination-mediated protein degradation. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 8991 selenium binding protein 1 NA
ENSG00000090920 NA NA NA NA TRUE
ENSG00000115386 REG1A This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. 5967 regenerating family member 1 alpha NA
ENSG00000204628 RACK1 NA 10399 receptor for activated C kinase 1 NA
ENSG00000170027 YWHAG This gene product belongs to the 14-3-3 family of proteins which mediate signal transduction by binding to phosphoserine-containing proteins. This highly conserved protein family is found in both plants and mammals, and this protein is 100% identical to the rat ortholog. It is induced by growth factors in human vascular smooth muscle cells, and is also highly expressed in skeletal and heart muscles, suggesting an important role for this protein in muscle tissue. It has been shown to interact with RAF1 and protein kinase C, proteins involved in various signal transduction pathways. 7532 tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein gamma NA
ENSG00000186395 KRT10 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. 3858 keratin 10 NA
ENSG00000068976 PYGM This gene encodes a muscle enzyme involved in glycogenolysis. Highly similar enzymes encoded by different genes are found in liver and brain. Mutations in this gene are associated with McArdle disease (myophosphorylase deficiency), a glycogen storage disease of muscle. Alternative splicing results in multiple transcript variants. 5837 phosphorylase, glycogen, muscle NA
ENSG00000140403 DNAJA4 NA 55466 DnaJ heat shock protein family (Hsp40) member A4 NA
ENSG00000168209 DDIT4 NA 54541 DNA damage inducible transcript 4 NA
ENSG00000175535 PNLIP This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. 5406 pancreatic lipase NA
ENSG00000171401 KRT13 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. 3860 keratin 13 NA
ENSG00000105220 GPI This gene encodes a member of the glucose phosphate isomerase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. In the cytoplasm, the gene product functions as a glycolytic enzyme (glucose-6-phosphate isomerase) that interconverts glucose-6-phophsate and fructose-6-phosphate. Extracellularly, the encoded protein (also referred to as neuroleukin) functions as a neurotrophic factor that promotes survival of skeletal motor neurons and sensory neurons, and as a lymphokine that induces immunoglobulin secretion. The encoded protein is also referred to as autocrine motility factor based on an additional function as a tumor-secreted cytokine and angiogenic factor. Defects in this gene are the cause of nonspherocytic hemolytic anemia and a severe enzyme deficiency can be associated with hydrops fetalis, immediate neonatal death and neurological impairment. Alternative splicing results in multiple transcript variants. 2821 glucose-6-phosphate isomerase NA
ENSG00000170477 KRT4 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3851 keratin 4 NA
ENSG00000135046 ANXA1 This gene encodes a membrane-localized protein that binds phospholipids. This protein inhibits phospholipase A2 and has anti-inflammatory activity. Loss of function or expression of this gene has been detected in multiple tumors. 301 annexin A1 NA
ENSG00000163631 ALB Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. 213 albumin NA
ENSG00000120049 KCNIP2 This gene encodes a member of the family of voltage-gated potassium (Kv) channel-interacting proteins (KCNIPs), which belongs to the recoverin branch of the EF-hand superfamily. Members of the KCNIP family are small calcium binding proteins. They all have EF-hand-like domains, and differ from each other in the N-terminus. They are integral subunit components of native Kv4 channel complexes. They may regulate A-type currents, and hence neuronal excitability, in response to changes in intracellular calcium. Multiple alternatively spliced transcript variants encoding distinct isoforms have been identified from this gene. 30819 potassium voltage-gated channel interacting protein 2 NA
ENSG00000142789 CELA3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. 10136 chymotrypsin like elastase family member 3A NA
ENSG00000159388 BTG2 The protein encoded by this gene is a member of the BTG/Tob family. This family has structurally related proteins that appear to have antiproliferative properties. This encoded protein is involved in the regulation of the G1/S transition of the cell cycle. 7832 BTG family member 2 NA
ENSG00000107796 ACTA2 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. 59 actin, alpha 2, smooth muscle, aorta NA
ENSG00000118503 TNFAIP3 This gene was identified as a gene whose expression is rapidly induced by the tumor necrosis factor (TNF). The protein encoded by this gene is a zinc finger protein and ubiqitin-editing enzyme, and has been shown to inhibit NF-kappa B activation as well as TNF-mediated apoptosis. The encoded protein, which has both ubiquitin ligase and deubiquitinase activities, is involved in the cytokine-mediated immune and inflammatory responses. Several transcript variants encoding the same protein have been found for this gene. 7128 TNF alpha induced protein 3 NA
ENSG00000147872 PLIN2 The protein encoded by this gene belongs to the perilipin family, members of which coat intracellular lipid storage droplets. This protein is associated with the lipid globule surface membrane material, and maybe involved in development and maintenance of adipose tissue. However, it is not restricted to adipocytes as previously thought, but is found in a wide range of cultured cell lines, including fibroblasts, endothelial and epithelial cells, and tissues, such as lactating mammary gland, adrenal cortex, Sertoli and Leydig cells, and hepatocytes in alcoholic liver cirrhosis, suggesting that it may serve as a marker of lipid accumulation in diverse cell types and diseases. Alternatively spliced transcript variants have been found for this gene. 123 perilipin 2 NA
ENSG00000177469 PTRF This gene encodes a protein that enables the dissociation of paused ternary polymerase I transcription complexes from the 3’ end of pre-rRNA transcripts. This protein regulates rRNA transcription by promoting the dissociation of transcription complexes and the reinitiation of polymerase I on nascent rRNA transcripts. This protein also localizes to caveolae at the plasma membrane and is thought to play a critical role in the formation of caveolae and the stabilization of caveolins. This protein translocates from caveolae to the cytoplasm after insulin stimulation. Caveolae contain truncated forms of this protein and may be the site of phosphorylation-dependent proteolysis. This protein is also thought to modify lipid metabolism and insulin-regulated gene expression. Mutations in this gene result in a disorder characterized by generalized lipodystrophy and muscular dystrophy. 284119 polymerase I and transcript release factor NA
ENSG00000163017 ACTG2 Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. 72 actin, gamma 2, smooth muscle, enteric NA
ENSG00000159176 CSRP1 This gene encodes a member of the cysteine-rich protein (CSRP) family. This gene family includes a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this gene product occurs in proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Alternatively spliced transcript variants have been described. 1465 cysteine and glycine rich protein 1 NA
ENSG00000177606 JUN This gene is the putative transforming gene of avian sarcoma virus 17. It encodes a protein which is highly similar to the viral protein, and which interacts directly with specific target DNA sequences to regulate gene expression. This gene is intronless and is mapped to 1p32-p31, a chromosomal region involved in both translocations and deletions in human malignancies. 3725 Jun proto-oncogene, AP-1 transcription factor subunit NA
ENSG00000171747 LGALS4 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. The expression of this gene is restricted to small intestine, colon, and rectum, and it is underexpressed in colorectal cancer. 3960 galectin 4 NA
ENSG00000169347 GP2 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. 2813 glycoprotein 2 NA
ENSG00000144655 CSRNP1 This gene encodes a protein that localizes to the nucleus and expression of this gene is induced in response to elevated levels of axin. The Wnt signalling pathway, which is negatively regulated by axin, is important in axis formation in early development and impaired regulation of this signalling pathway is often involved in tumors. A decreased level of expression of this gene in tumors compared to the level of expression in their corresponding normal tissues suggests that this gene product has a tumor suppressor function. Alternative splicing results in multiple transcript variants. 64651 cysteine and serine rich nuclear protein 1 NA
ENSG00000132693 CRP The protein encoded by this gene belongs to the pentaxin family. It is involved in several host defense related functions based on its ability to recognize foreign pathogens and damaged cells of the host and to initiate their elimination by interacting with humoral and cellular effector systems in the blood. Consequently, the level of this protein in plasma increases greatly during acute phase response to tissue injury, infection, or other inflammatory stimuli. 1401 C-reactive protein, pentraxin-related NA
ENSG00000112715 VEGFA This gene is a member of the PDGF/VEGF growth factor family. It encodes a heparin-binding protein, which exists as a disulfide-linked homodimer. This growth factor induces proliferation and migration of vascular endothelial cells, and is essential for both physiological and pathological angiogenesis. Disruption of this gene in mice resulted in abnormal embryonic blood vessel formation. This gene is upregulated in many known tumors and its expression is correlated with tumor stage and progression. Elevated levels of this protein are found in patients with POEMS syndrome, also known as Crow-Fukase syndrome. Allelic variants of this gene have been associated with microvascular complications of diabetes 1 (MVCD1) and atherosclerosis. Alternatively spliced transcript variants encoding different isoforms have been described. There is also evidence for alternative translation initiation from upstream non-AUG (CUG) codons resulting in additional isoforms. A recent study showed that a C-terminally extended isoform is produced by use of an alternative in-frame translation termination codon via a stop codon readthrough mechanism, and that this isoform is antiangiogenic. Expression of some isoforms derived from the AUG start codon is regulated by a small upstream open reading frame, which is located within an internal ribosome entry site. 7422 vascular endothelial growth factor A NA
ENSG00000155657 TTN This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. 7273 titin NA
ENSG00000112936 C7 C7 is a component of the complement system. It participates in the formation of Membrane Attack Complex (MAC). People with C7 deficiency are prone to bacterial infection. 730 complement component 7 NA
ENSG00000197616 MYH6 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. 4624 myosin, heavy chain 6, cardiac muscle, alpha NA
ENSG00000150991 UBC This gene represents a ubiquitin gene, ubiquitin C. The encoded protein is a polyubiquitin precursor. Conjugation of ubiquitin monomers or polymers can lead to various effects within a cell, depending on the residues to which ubiquitin is conjugated. Ubiquitination has been associated with protein degradation, DNA repair, cell cycle regulation, kinase modification, endocytosis, and regulation of other cell signaling pathways. 7316 ubiquitin C NA
ENSG00000122786 CALD1 This gene encodes a calmodulin- and actin-binding protein that plays an essential role in the regulation of smooth muscle and nonmuscle contraction. The conserved domain of this protein possesses the binding activities to Ca(2+)-calmodulin, actin, tropomyosin, myosin, and phospholipids. This protein is a potent inhibitor of the actin-tropomyosin activated myosin MgATPase, and serves as a mediating factor for Ca(2+)-dependent inhibition of smooth muscle contraction. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. 800 caldesmon 1 NA
ENSG00000072110 ACTN1 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, cytoskeletal, alpha actinin isoform and maps to the same site as the structurally similar erythroid beta spectrin gene. Three transcript variants encoding different isoforms have been found for this gene. 87 actinin alpha 1 NA
ENSG00000175445 LPL LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. 4023 lipoprotein lipase NA
ENSG00000153002 CPB1 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. 1360 carboxypeptidase B1 NA
ENSG00000158050 DUSP2 The protein encoded by this gene is a member of the dual specificity protein phosphatase subfamily. These phosphatases inactivate their target kinases by dephosphorylating both the phosphoserine/threonine and phosphotyrosine residues. They negatively regulate members of the mitogen-activated protein (MAP) kinase superfamily (MAPK/ERK, SAPK/JNK, p38), which are associated with cellular proliferation and differentiation. Different members of the family of dual specificity phosphatases show distinct substrate specificities for various MAP kinases, different tissue distribution and subcellular localization, and different modes of inducibility of their expression by extracellular stimuli. This gene product inactivates ERK1 and ERK2, is predominantly expressed in hematopoietic tissues, and is localized in the nucleus. 1844 dual specificity phosphatase 2 NA
ENSG00000196531 NACA This gene encodes a protein that associates with basic transcription factor 3 (BTF3) to form the nascent polypeptide-associated complex (NAC). This complex binds to nascent proteins that lack a signal peptide motif as they emerge from the ribosome, blocking interaction with the signal recognition particle (SRP) and preventing mistranslocation to the endoplasmic reticulum. This protein is an IgE autoantigen in atopic dermatitis patients. Alternative splicing results in multiple transcript variants, but the full length nature of some of these variants, including those encoding very large proteins, has not been determined. There are multiple pseudogenes of this gene on different chromosomes. 4666 nascent polypeptide-associated complex alpha subunit NA
ENSG00000120738 EGR1 The protein encoded by this gene belongs to the EGR family of C2H2-type zinc-finger proteins. It is a nuclear protein and functions as a transcriptional regulator. The products of target genes it activates are required for differentitation and mitogenesis. Studies suggest this is a cancer suppressor gene. 1958 early growth response 1 NA
ENSG00000134339 SAA2 NA 6289 serum amyloid A2 NA
ENSG00000138356 AOX1 Aldehyde oxidase produces hydrogen peroxide and, under certain conditions, can catalyze the formation of superoxide. Aldehyde oxidase is a candidate gene for amyotrophic lateral sclerosis. 316 aldehyde oxidase 1 NA
ENSG00000138207 RBP4 This protein belongs to the lipocalin family and is the specific carrier for retinol (vitamin A alcohol) in the blood. It delivers retinol from the liver stores to the peripheral tissues. In plasma, the RBP-retinol complex interacts with transthyretin which prevents its loss by filtration through the kidney glomeruli. A deficiency of vitamin A blocks secretion of the binding protein posttranslationally and results in defective delivery and supply to the epidermal cells. 5950 retinol binding protein 4 NA
ENSG00000112149 CD83 The protein encoded by this gene is a single-pass type I membrane protein and member of the immunoglobulin superfamily of receptors. The encoded protein may be involved in the regulation of antigen presentation. A soluble form of this protein can bind to dendritic cells and inhibit their maturation. Three transcript variants encoding different isoforms have been found for this gene. 9308 CD83 molecule NA
ENSG00000143549 TPM3 This gene encodes a member of the tropomyosin family of actin-binding proteins. Tropomyosins are dimers of coiled-coil proteins that provide stability to actin filaments and regulate access of other actin-binding proteins. Mutations in this gene result in autosomal dominant nemaline myopathy and other muscle disorders. This locus is involved in translocations with other loci, including anaplastic lymphoma receptor tyrosine kinase (ALK) and neurotrophic tyrosine kinase receptor type 1 (NTRK1), which result in the formation of fusion proteins that act as oncogenes. There are numerous pseudogenes for this gene on different chromosomes. Alternative splicing results in multiple transcript variants. 7170 tropomyosin 3 NA
ENSG00000134571 MYBPC3 MYBPC3 encodes the cardiac isoform of myosin-binding protein C. Myosin-binding protein C is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. MYBPC3, the cardiac isoform, is expressed exclussively in heart muscle. Regulatory phosphorylation of the cardiac isoform in vivo by cAMP-dependent protein kinase (PKA) upon adrenergic stimulation may be linked to modulation of cardiac contraction. Mutations in MYBPC3 are one cause of familial hypertrophic cardiomyopathy. 4607 myosin binding protein C, cardiac NA
ENSG00000170989 S1PR1 The protein encoded by this gene is structurally similar to G protein-coupled receptors and is highly expressed in endothelial cells. It binds the ligand sphingosine-1-phosphate with high affinity and high specificity, and suggested to be involved in the processes that regulate the differentiation of endothelial cells. Activation of this receptor induces cell-cell adhesion. Alternative splicing results in multiple transcript variants. 1901 sphingosine-1-phosphate receptor 1 NA
ENSG00000014641 MDH1 This gene encodes an enzyme that catalyzes the NAD/NADH-dependent, reversible oxidation of malate to oxaloacetate in many metabolic pathways, including the citric acid cycle. Two main isozymes are known to exist in eukaryotic cells: one is found in the mitochondrial matrix and the other in the cytoplasm. This gene encodes the cytosolic isozyme, which plays a key role in the malate-aspartate shuttle that allows malate to pass through the mitochondrial membrane to be transformed into oxaloacetate for further cellular processes. Alternatively spliced transcript variants have been found for this gene. A recent study showed that a C-terminally extended isoform is produced by use of an alternative in-frame translation termination codon via a stop codon readthrough mechanism, and that this isoform is localized in the peroxisomes. Pseudogenes have been identified on chromosomes X and 6. 4190 malate dehydrogenase 1 NA
ENSG00000164056 SPRY1 NA 10252 sprouty RTK signaling antagonist 1 NA
ENSG00000137392 CLPS The protein encoded by this gene is a cofactor needed by pancreatic lipase for efficient dietary lipid hydrolysis. It binds to the C-terminal, non-catalytic domain of lipase, thereby stabilizing an active conformation and considerably increasing the overall hydrophobic binding site. The gene product allows lipase to anchor noncovalently to the surface of lipid micelles, counteracting the destabilizing influence of intestinal bile salts. This cofactor is only expressed in pancreatic acinar cells, suggesting regulation of expression by tissue-specific elements. Three transcript variants encoding different isoforms have been found for this gene. 1208 colipase NA
ENSG00000109061 MYH1 Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development. 4619 myosin, heavy chain 1, skeletal muscle, adult NA
ENSG00000147459 DOCK5 NA 80005 dedicator of cytokinesis 5 NA
ENSG00000269926 RP11-442H21.2 NA ENSG00000269926 NA NA
ENSG00000151914 DST This gene encodes a member of the plakin protein family of adhesion junction plaque proteins. Multiple alternatively spliced transcript variants encoding distinct isoforms have been found for this gene, but the full-length nature of some variants has not been defined. It has been reported that some isoforms are expressed in neural and muscle tissue, anchoring neural intermediate filaments to the actin cytoskeleton, and some isoforms are expressed in epithelial tissue, anchoring keratin-containing intermediate filaments to hemidesmosomes. Consistent with the expression, mice defective for this gene show skin blistering and neurodegeneration. 667 dystonin NA
ENSG00000125730 C3 Complement component C3 plays a central role in the activation of complement system. Its activation is required for both classical and alternative complement activation pathways. The encoded preproprotein is proteolytically processed to generate alpha and beta subunits that form the mature protein, which is then further processed to generate numerous peptide products. The C3a peptide, also known as the C3a anaphylatoxin, modulates inflammation and possesses antimicrobial activity. Mutations in this gene are associated with atypical hemolytic uremic syndrome and age-related macular degeneration in human patients. 718 complement component 3 NA
ENSG00000219073 CELA3B Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. 23436 chymotrypsin like elastase family member 3B NA
ENSG00000125414 MYH2 Myosins are actin-based motor proteins that function in the generation of mechanical force in eukaryotic cells. Muscle myosins are heterohexamers composed of 2 myosin heavy chains and 2 pairs of nonidentical myosin light chains. This gene encodes a member of the class II or conventional myosin heavy chains, and functions in skeletal muscle contraction. This gene is found in a cluster of myosin heavy chain genes on chromosome 17. A mutation in this gene results in inclusion body myopathy-3. Multiple alternatively spliced variants, encoding the same protein, have been identified. 4620 myosin, heavy chain 2, skeletal muscle, adult NA
ENSG00000089157 RPLP0 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein, which is the functional equivalent of the E. coli L10 ribosomal protein, belongs to the L10P family of ribosomal proteins. It is a neutral phosphoprotein with a C-terminal end that is nearly identical to the C-terminal ends of the acidic ribosomal phosphoproteins P1 and P2. The P0 protein can interact with P1 and P2 to form a pentameric complex consisting of P1 and P2 dimers, and a P0 monomer. The protein is located in the cytoplasm. Transcript variants derived from alternative splicing exist; they encode the same protein. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. 6175 ribosomal protein lateral stalk subunit P0 NA
ENSG00000118194 TNNT2 The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. 7139 troponin T2, cardiac type NA
ENSG00000170315 UBB This gene encodes ubiquitin, one of the most conserved proteins known. Ubiquitin has a major role in targeting cellular proteins for degradation by the 26S proteosome. It is also involved in the maintenance of chromatin structure, the regulation of gene expression, and the stress response. Ubiquitin is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin moiety fused to an unrelated protein. This gene consists of three direct repeats of the ubiquitin coding sequence with no spacer sequence. Consequently, the protein is expressed as a polyubiquitin precursor with a final amino acid after the last repeat. An aberrant form of this protein has been detected in patients with Alzheimer’s disease and Down syndrome. Pseudogenes of this gene are located on chromosomes 1, 2, 13, and 17. Alternative splicing results in multiple transcript variants. 7314 ubiquitin B NA
ENSG00000035862 TIMP2 This gene is a member of the TIMP gene family. The proteins encoded by this gene family are natural inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. In addition to an inhibitory role against metalloproteinases, the encoded protein has a unique role among TIMP family members in its ability to directly suppress the proliferation of endothelial cells. As a result, the encoded protein may be critical to the maintenance of tissue homeostasis by suppressing the proliferation of quiescent tissues in response to angiogenic factors, and by inhibiting protease activity in tissues undergoing remodelling of the extracellular matrix. 7077 TIMP metallopeptidase inhibitor 2 NA
ENSG00000166923 GREM1 This gene encodes a member of the BMP (bone morphogenic protein) antagonist family. Like BMPs, BMP antagonists contain cystine knots and typically form homo- and heterodimers. The CAN (cerberus and dan) subfamily of BMP antagonists, to which this gene belongs, is characterized by a C-terminal cystine knot with an eight-membered ring. The antagonistic effect of the secreted glycosylated protein encoded by this gene is likely due to its direct binding to BMP proteins. As an antagonist of BMP, this gene may play a role in regulating organogenesis, body patterning, and tissue differentiation. In mouse, this protein has been shown to relay the sonic hedgehog (SHH) signal from the polarizing region to the apical ectodermal ridge during limb bud outgrowth. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 26585 gremlin 1, DAN family BMP antagonist NA
ENSG00000135447 PPP1R1A NA 5502 protein phosphatase 1 regulatory inhibitor subunit 1A NA
ENSG00000259716 NA NA NA NA TRUE
ENSG00000197971 MBP The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. 4155 myelin basic protein NA
ENSG00000106211 HSPB1 The protein encoded by this gene is induced by environmental stress and developmental changes. The encoded protein is involved in stress resistance and actin organization and translocates from the cytoplasm to the nucleus upon stress induction. Defects in this gene are a cause of Charcot-Marie-Tooth disease type 2F (CMT2F) and distal hereditary motor neuropathy (dHMN). 3315 heat shock protein family B (small) member 1 NA
ENSG00000135842 FAM129A NA 116496 family with sequence similarity 129 member A NA
ENSG00000129353 SLC44A2 NA 57153 solute carrier family 44 member 2 NA
ENSG00000111341 MGP The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. 4256 matrix Gla protein NA
ENSG00000070756 PABPC1 This gene encodes a poly(A) binding protein. The protein shuttles between the nucleus and cytoplasm and binds to the 3’ poly(A) tail of eukaryotic messenger RNAs via RNA-recognition motifs. The binding of this protein to poly(A) promotes ribosome recruitment and translation initiation; it is also required for poly(A) shortening which is the first step in mRNA decay. The gene is part of a small gene family including three protein-coding genes and several pseudogenes. 26986 poly(A) binding protein cytoplasmic 1 NA
ENSG00000115541 HSPE1 This gene encodes a major heat shock protein which functions as a chaperonin. Its structure consists of a heptameric ring which binds to another heat shock protein in order to form a symmetric, functional heterodimer which enhances protein folding in an ATP-dependent manner. This gene and its co-chaperonin, HSPD1, are arranged in a head-to-head orientation on chromosome 2. Naturally occurring read-through transcription occurs between this locus and the neighboring locus MOBKL3. 3336 heat shock protein family E (Hsp10) member 1 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",6,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 7 Annotations

out <- mygene::queryMany(gene_list[7,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
name summary X_id query symbol
keratin 13 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. 3860 ENSG00000171401 KRT13
keratin 4 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3851 ENSG00000170477 KRT4
small proline rich protein 3 NA 6707 ENSG00000163209 SPRR3
NA NA ENSG00000229732 ENSG00000229732 AC019349.5
keratin 6A The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. As many as six of this type II cytokeratin (KRT6) have been identified; the multiplicity of the genes is attributed to successive gene duplication events. The genes are expressed with family members KRT16 and/or KRT17 in the filiform papillae of the tongue, the stratified epithelial lining of oral mucosa and esophagus, the outer root sheath of hair follicles, and the glandular epithelia. This KRT6 gene in particular encodes the most abundant isoform. Mutations in these genes have been associated with pachyonychia congenita. In addition, peptides from the C-terminal region of the protein have antimicrobial activity against bacterial pathogens. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3853 ENSG00000205420 KRT6A
S100 calcium binding protein A9 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. 6280 ENSG00000163220 S100A9
annexin A1 This gene encodes a membrane-localized protein that binds phospholipids. This protein inhibits phospholipase A2 and has anti-inflammatory activity. Loss of function or expression of this gene has been detected in multiple tumors. 301 ENSG00000135046 ANXA1
cornulin This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation. 49860 ENSG00000143536 CRNN
Rh family C glycoprotein NA 51458 ENSG00000140519 RHCG
S100 calcium binding protein A8 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and as a cytokine. Altered expression of this protein is associated with the disease cystic fibrosis. Multiple transcript variants encoding different isoforms have been found for this gene. 6279 ENSG00000143546 S100A8
cystatin B The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and kininogens. This gene encodes a stefin that functions as an intracellular thiol protease inhibitor. The protein is able to form a dimer stabilized by noncovalent forces, inhibiting papain and cathepsins l, h and b. The protein is thought to play a role in protecting against the proteases leaking from lysosomes. Evidence indicates that mutations in this gene are responsible for the primary defects in patients with progressive myoclonic epilepsy (EPM1). 1476 ENSG00000160213 CSTB
epithelial membrane protein 1 NA 2012 ENSG00000134531 EMP1
keratin 10 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. 3858 ENSG00000186395 KRT10
keratin 1 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3848 ENSG00000167768 KRT1
S100 calcium binding protein A14 This gene encodes a member of the S100 protein family which contains an EF-hand motif and binds calcium. The gene is located in a cluster of S100 genes on chromosome 1. Levels of the encoded protein have been found to be lower in cancerous tissue and associated with metastasis suggesting a tumor suppressor function (PMID: 19956863, 19351828). 57402 ENSG00000189334 S100A14
mal, T-cell differentiation protein The protein encoded by this gene is a highly hydrophobic integral membrane protein belonging to the MAL family of proteolipids. The protein has been localized to the endoplasmic reticulum of T-cells and is a candidate linker protein in T-cell signal transduction. In addition, this proteolipid is localized in compact myelin of cells in the nervous system and has been implicated in myelin biogenesis and/or function. The protein plays a role in the formation, stabilization and maintenance of glycosphingolipid-enriched membrane microdomains. Down-regulation of this gene has been associated with a variety of human epithelial malignancies. Alternative splicing produces four transcript variants which vary from each other by the presence or absence of alternatively spliced exons 2 and 3. 4118 ENSG00000172005 MAL
transglutaminase 3 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene consists of two polypeptide chains activated from a single precursor protein by proteolysis. The encoded protein is involved the later stages of cell envelope formation in the epidermis and hair follicle. 7053 ENSG00000125780 TGM3
small proline rich protein 2A NA 6700 ENSG00000241794 SPRR2A
cystatin A The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins, and kininogens. This gene encodes a stefin that functions as a cysteine protease inhibitor, forming tight complexes with papain and the cathepsins B, H, and L. The protein is one of the precursor proteins of cornified cell envelope in keratinocytes and plays a role in epidermal development and maintenance. Stefins have been proposed as prognostic and diagnostic tools for cancer. 1475 ENSG00000121552 CSTA
small proline rich protein 1A NA 6698 ENSG00000169474 SPRR1A
keratin 2 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3849 ENSG00000172867 KRT2
extracellular matrix protein 1 This gene encodes a soluble protein that is involved in endochondral bone formation, angiogenesis, and tumor biology. It also interacts with a variety of extracellular and structural proteins, contributing to the maintenance of skin integrity and homeostasis. Mutations in this gene are associated with lipoid proteinosis disorder (also known as hyalinosis cutis et mucosae or Urbach-Wiethe disease) that is characterized by generalized thickening of skin, mucosae and certain viscera. Alternatively spliced transcript variants encoding distinct isoforms have been described for this gene. 1893 ENSG00000143369 ECM1
family with sequence similarity 129 member B NA 64855 ENSG00000136830 FAM129B
S100 calcium binding protein A11 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in motility, invasion, and tubulin polymerization. Chromosomal rearrangements and altered expression of this gene have been implicated in tumor metastasis. 6282 ENSG00000163191 S100A11
S100 calcium binding protein A16 NA 140576 ENSG00000188643 S100A16
interleukin 1 receptor antagonist The protein encoded by this gene is a member of the interleukin 1 cytokine family. This protein inhibits the activities of interleukin 1, alpha (IL1A) and interleukin 1, beta (IL1B), and modulates a variety of interleukin 1 related immune and inflammatory responses. This gene and five other closely related cytokine genes form a gene cluster spanning approximately 400 kb on chromosome 2. A polymorphism of this gene is reported to be associated with increased risk of osteoporotic fractures and gastric cancer. Several alternatively spliced transcript variants encoding distinct isoforms have been reported. 3557 ENSG00000136689 IL1RN
gap junction protein beta 2 This gene encodes a member of the gap junction protein family. The gap junctions were first characterized by electron microscopy as regionally specialized structures on plasma membranes of contacting adherent cells. These structures were shown to consist of cell-to-cell channels that facilitate the transfer of ions and small molecules between cells. The gap junction proteins, also known as connexins, purified from fractions of enriched gap junctions from different tissues differ. According to sequence similarities at the nucleotide and amino acid levels, the gap junction proteins are divided into two categories, alpha and beta. Mutations in this gene are responsible for as much as 50% of pre-lingual, recessive deafness. 2706 ENSG00000165474 GJB2
serine peptidase inhibitor, Kazal type 5 This gene encodes a multidomain serine protease inhibitor that contains 15 potential inhibitory domains. The encoded preproprotein is proteolytically processed to generate multiple protein products, which may exhibit unique activities and specificities. These proteins may play a role in skin and hair morphogenesis, as well as anti-inflammatory and antimicrobial protection of mucous epithelia. Mutations in this gene may result in Netherton syndrome, a disorder characterized by ichthyosis, defective cornification, and atopy. This gene is present in a gene cluster on chromosome 5. Alternative splicing results in multiple transcript variants. 11005 ENSG00000133710 SPINK5
periplakin The protein encoded by this gene is a component of desmosomes and of the epidermal cornified envelope in keratinocytes. The N-terminal domain of this protein interacts with the plasma membrane and its C-terminus interacts with intermediate filaments. Through its rod domain, this protein forms complexes with envoplakin. This protein may serve as a link between the cornified envelope and desmosomes as well as intermediate filaments. AKT1/PKB, a protein kinase mediating a variety of cell growth and survival signaling processes, is reported to interact with this protein, suggesting a possible role for this protein as a localization signal in AKT1-mediated signaling. 5493 ENSG00000118898 PPL
desmocollin 2 This gene encodes a member of the desmocollin protein subfamily. Desmocollins, along with desmogleins, are cadherin-like transmembrane glycoproteins that are major components of the desmosome. Desmosomes are cell-cell junctions that help resist shearing forces and are found in high concentrations in cells subject to mechanical stress. This gene is found in a cluster with other desmocollin family members on chromosome 18. Mutations in this gene are associated with arrhythmogenic right ventricular dysplasia-11, and reduced protein expression has been described in several types of cancer. Alternative splicing results in multiple transcript variants. 1824 ENSG00000134755 DSC2
fatty acid binding protein 5 pseudogene 7 NA ENSG00000234964 ENSG00000234964 FABP5P7
S100 calcium binding protein A2 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may have a tumor suppressor function. Chromosomal rearrangements and altered expression of this gene have been implicated in breast cancer. 6273 ENSG00000196754 S100A2
stratifin NA 2810 ENSG00000175793 SFN
peptidase inhibitor 3 This gene encodes an elastase-specific inhibitor that functions as an antimicrobial peptide against Gram-positive and Gram-negative bacteria, and fungal pathogens. The protein contains a WAP-type four-disulfide core (WFDC) domain, and is thus a member of the WFDC domain family. Most WFDC gene members are localized to chromosome 20q12-q13 in two clusters: centromeric and telomeric. This gene belongs to the centromeric cluster. Expression of this gene is upgulated by bacterial lipopolysaccharides and cytokines. 5266 ENSG00000124102 PI3
keratin 19 The protein encoded by this gene is a member of the keratin family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. The type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. Unlike its related family members, this smallest known acidic cytokeratin is not paired with a basic cytokeratin in epithelial cells. It is specifically expressed in the periderm, the transiently superficial layer that envelopes the developing epidermis. The type I cytokeratins are clustered in a region of chromosome 17q12-q21. 3880 ENSG00000171345 KRT19
transglutaminase 1 The protein encoded by this gene is a membrane protein that catalyzes the addition of an alkyl group from an akylamine to a glutamine residue of a protein, forming an alkylglutamine in the protein. This protein alkylation leads to crosslinking of proteins and catenation of polyamines to proteins. This gene contains either one or two copies of a 22 nt repeat unit in its 3’ UTR. Mutations in this gene have been associated with autosomal recessive lamellar ichthyosis (LI) and nonbullous congenital ichthyosiform erythroderma (NCIE). 7051 ENSG00000092295 TGM1
phosphogluconate dehydrogenase 6-phosphogluconate dehydrogenase is the second dehydrogenase in the pentose phosphate shunt. Deficiency of this enzyme is generally asymptomatic, and the inheritance of this disorder is autosomal dominant. Hemolysis results from combined deficiency of 6-phosphogluconate dehydrogenase and 6-phosphogluconolactonase suggesting a synergism of the two enzymopathies. Several transcript variants encoding different isoforms have been found for this gene. 5226 ENSG00000142657 PGD
lipocalin 2 This gene encodes a protein that belongs to the lipocalin family. Members of this family transport small hydrophobic molecules such as lipids, steroid hormones and retinoids. The protein encoded by this gene is a neutrophil gelatinase-associated lipocalin and plays a role in innate immunity by limiting bacterial growth as a result of sequestering iron-containing siderophores. The presence of this protein in blood and urine is an early biomarker of acute kidney injury. This protein is thought to be be involved in multiple cellular processes, including maintenance of skin homeostasis, and suppression of invasiveness and metastasis. Mice lacking this gene are more susceptible to bacterial infection than wild type mice. 3934 ENSG00000148346 LCN2
keratin 15 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains and are clustered in a region on chromosome 17q21.2. 3866 ENSG00000171346 KRT15
secretory leukocyte peptidase inhibitor This gene encodes a secreted inhibitor which protects epithelial tissues from serine proteases. It is found in various secretions including seminal plasma, cervical mucus, and bronchial secretions, and has affinity for trypsin, leukocyte elastase, and cathepsin G. Its inhibitory effect contributes to the immune response by protecting epithelial surfaces from attack by endogenous proteolytic enzymes. This antimicrobial protein has antibacterial, antifungal and antiviral activity. 6590 ENSG00000124107 SLPI
ras homolog family member B NA 388 ENSG00000143878 RHOB
aquaporin 3 (Gill blood group) This gene encodes the water channel protein aquaporin 3. Aquaporins are a family of small integral membrane proteins related to the major intrinsic protein, also known as aquaporin 0. Aquaporin 3 is localized at the basal lateral membranes of collecting duct cells in the kidney. In addition to its water channel function, aquaporin 3 has been found to facilitate the transport of nonionic small solutes such as urea and glycerol, but to a smaller degree. It has been suggested that water channels can be functionally heterogeneous and possess water and solute permeation mechanisms. Alternative splicing of this gene results in multiple transcript variants encoding different isoforms. 360 ENSG00000165272 AQP3
serpin family B member 1 The protein encoded by this gene is a member of the serpin family of proteinase inhibitors. Members of this family maintain homeostasis by neutralizing overexpressed proteinase activity through their function as suicide substrates. This protein inhibits the neutrophil-derived proteinases neutrophil elastase, cathepsin G, and proteinase-3 and thus protects tissues from damage at inflammatory sites. Alternative splicing results in multiple transcript variants. 1992 ENSG00000021355 SERPINB1
aldehyde dehydrogenase 3 family member A1 Aldehyde dehydrogenases oxidize various aldehydes to the corresponding acids. They are involved in the detoxification of alcohol-derived acetaldehyde and in the metabolism of corticosteroids, biogenic amines, neurotransmitters, and lipid peroxidation. The enzyme encoded by this gene forms a cytoplasmic homodimer that preferentially oxidizes aromatic and medium-chain (6 carbons or more) saturated and unsaturated aldehyde substrates. It is thought to promote resistance to UV and 4-hydroxy-2-nonenal-induced oxidative damage in the cornea. The gene is located within the Smith-Magenis syndrome region on chromosome 17. Multiple alternatively spliced variants, encoding the same protein, have been identified. 218 ENSG00000108602 ALDH3A1
tumor-associated calcium signal transducer 2 This intronless gene encodes a carcinoma-associated antigen. This antigen is a cell surface receptor that transduces calcium signals. Mutations of this gene have been associated with gelatinous drop-like corneal dystrophy. 4070 ENSG00000184292 TACSTD2
small proline rich protein 1B The protein encoded by this gene is an envelope protein of keratinocytes. The encoded protein is crosslinked to membrane proteins by transglutaminase, forming an insoluble layer under the plasma membrane. This protein is proline-rich and contains several tandem amino acid repeats. 6699 ENSG00000169469 SPRR1B
granulin Granulins are a family of secreted, glycosylated peptides that are cleaved from a single precursor protein with 7.5 repeats of a highly conserved 12-cysteine granulin/epithelin motif. The 88 kDa precursor protein, progranulin, is also called proepithelin and PC cell-derived growth factor. Cleavage of the signal peptide produces mature granulin which can be further cleaved into a variety of active, 6 kDa peptides. These smaller cleavage products are named granulin A, granulin B, granulin C, etc. Epithelins 1 and 2 are synonymous with granulins A and B, respectively. Both the peptides and intact granulin protein regulate cell growth. However, different members of the granulin protein family may act as inhibitors, stimulators, or have dual actions on cell growth. Granulin family members are important in normal development, wound healing, and tumorigenesis. 2896 ENSG00000030582 GRN
loricrin This gene encodes loricrin, a major protein component of the cornified cell envelope found in terminally differentiated epidermal cells. Mutations in this gene are associated with Vohwinkel’s syndrome and progressive symmetric erythrokeratoderma, both inherited skin diseases. 4014 ENSG00000203782 LOR
cellular retinoic acid binding protein 2 This gene encodes a member of the retinoic acid (RA, a form of vitamin A) binding protein family and lipocalin/cytosolic fatty-acid binding protein family. The protein is a cytosol-to-nuclear shuttling protein, which facilitates RA binding to its cognate receptor complex and transfer to the nucleus. It is involved in the retinoid signaling pathway, and is associated with increased circulating low-density lipoprotein cholesterol. Alternatively spliced transcript variants encoding the same protein have been found for this gene. 1382 ENSG00000143320 CRABP2
annexin A2 This gene encodes a member of the annexin family. Members of this calcium-dependent phospholipid-binding protein family play a role in the regulation of cellular growth and in signal transduction pathways. This protein functions as an autocrine factor which heightens osteoclast formation and bone resorption. This gene has three pseudogenes located on chromosomes 4, 9 and 10, respectively. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. 302 ENSG00000182718 ANXA2
S100 calcium binding protein A10 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in exocytosis and endocytosis. 6281 ENSG00000197747 S100A10
protease, serine 27 This gene is located within a large protease gene cluster on chromosome 16. It belongs to the group-1 subfamily of serine proteases. The encoded protein is a secreted tryptic serine protease and is expressed mainly in the pancreas. Alternative splicing results in multiple transcript variants. 83886 ENSG00000172382 PRSS27
EPS8 like 1 This gene encodes a protein that is related to epidermal growth factor receptor pathway substrate 8 (EPS8), a substrate for the epidermal growth factor receptor. The function of this protein is unknown. At least two alternatively spliced transcript variants encoding different isoforms have been found for this gene. 54869 ENSG00000131037 EPS8L1
catenin delta 1 This gene encodes a member of the Armadillo protein family, which function in adhesion between cells and signal transduction. Multiple translation initiation codons and alternative splicing result in many different isoforms being translated. Not all of the full-length natures of the described transcript variants have been determined. Read-through transcription also exists between this gene and the neighboring upstream thioredoxin-related transmembrane protein 2 (TMX2) gene. 1500 ENSG00000198561 CTNND1
fatty acid binding protein 5 This gene encodes the fatty acid binding protein found in epidermal cells, and was first identified as being upregulated in psoriasis tissue. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. FABPs may play roles in fatty acid uptake, transport, and metabolism. Polymorphisms in this gene are associated with type 2 diabetes. The human genome contains many pseudogenes similar to this locus. 2171 ENSG00000164687 FABP5
calmodulin like 3 NA 810 ENSG00000178363 CALML3
cornifelin NA 84518 ENSG00000105427 CNFN
myelin protein zero like 2 Thymus development depends on a complex series of interactions between thymocytes and the stromal component of the organ. Epithelial V-like antigen (EVA) is expressed in thymus epithelium and strongly downregulated by thymocyte developmental progression. This gene is expressed in the thymus and in several epithelial structures early in embryogenesis. It is highly homologous to the myelin protein zero and, in thymus-derived epithelial cell lines, is poorly soluble in nonionic detergents, strongly suggesting an association to the cytoskeleton. Its capacity to mediate cell adhesion through a homophilic interaction and its selective regulation by T cell maturation might imply the participation of EVA in the earliest phases of thymus organogenesis. The protein bears a characteristic V-type domain and two potential N-glycosylation sites in the extracellular domain; a putative serine phosphorylation site for casein kinase 2 is also present in the cytoplasmic tail. Two transcript variants encoding the same protein have been found for this gene. 10205 ENSG00000149573 MPZL2
EPS8 like 2 This gene encodes a member of the EPS8 gene family. The encoded protein, like other members of the family, is thought to link growth factor stimulation to actin organization, generating functional redundancy in the pathways that regulate actin cytoskeletal remodeling. 64787 ENSG00000177106 EPS8L2
RAB10, member RAS oncogene family RAB10 belongs to the RAS (see HRAS; MIM 190020) superfamily of small GTPases. RAB proteins localize to exocytic and endocytic compartments and regulate intracellular vesicle trafficking (Bao et al., 1998 [PubMed 9918381]). 10890 ENSG00000084733 RAB10
fatty acyl-CoA reductase 1 The protein encoded by this gene is required for the reduction of fatty acids to fatty alcohols, a process that is required for the synthesis of monoesters and ether lipids. NADPH is required as a cofactor in this reaction, and 16-18 carbon saturated and unsaturated fatty acids are the preferred substrate. This is a peroxisomal membrane protein, and studies suggest that the N-terminus contains a large catalytic domain located on the outside of the peroxisome, while the C-terminus is exposed to the matrix of the peroxisome. Studies indicate that the regulation of this protein is dependent on plasmalogen levels. Mutations in this gene have been associated with individuals affected by severe intellectual disability, early-onset epilepsy, microcephaly, congenital cataracts, growth retardation, and spasticity (PMID: 25439727). A pseudogene of this gene is located on chromosome 13. 84188 ENSG00000197601 FAR1
EPH receptor A2 This gene belongs to the ephrin receptor subfamily of the protein-tyrosine kinase family. EPH and EPH-related receptors have been implicated in mediating developmental events, particularly in the nervous system. Receptors in the EPH subfamily typically have a single kinase domain and an extracellular region containing a Cys-rich domain and 2 fibronectin type III repeats. The ephrin receptors are divided into 2 groups based on the similarity of their extracellular domain sequences and their affinities for binding ephrin-A and ephrin-B ligands. This gene encodes a protein that binds ephrin-A ligands. Mutations in this gene are the cause of certain genetically-related cataract disorders. 1969 ENSG00000142627 EPHA2
transmembrane protein 45A NA 55076 ENSG00000181458 TMEM45A
actin binding LIM protein 1 This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. 3983 ENSG00000099204 ABLIM1
NA NA ENSG00000249007 ENSG00000249007 RP11-510N19.5
cysteine rich C-terminal 1 NA 54544 ENSG00000169509 CRCT1
absent in melanoma 1 NA 202 ENSG00000112297 AIM1
calmodulin like 5 This gene encodes a novel calcium binding protein expressed in the epidermis and related to the calmodulin family of calcium binding proteins. Functional studies with recombinant protein demonstrate it does bind calcium and undergoes a conformational change when it does so. Abundant expression is detected only in reconstructed epidermis and is restricted to differentiating keratinocytes. In addition, it can associate with transglutaminase 3, shown to be a key enzyme in the terminal differentiation of keratinocytes. 51806 ENSG00000178372 CALML5
aspartic peptidase, retroviral-like 1 NA 151516 ENSG00000244617 ASPRV1
Rho GTPase activating protein 27 This gene encodes a member of a large family of proteins that activate Rho-type guanosine triphosphate (GTP) metabolizing enzymes. The encoded protein may pay a role in clathrin-mediated endocytosis. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 201176 ENSG00000159314 ARHGAP27
quiescin sulfhydryl oxidase 1 This gene encodes a protein that contains domains of thioredoxin and ERV1, members of two long-standing gene families. The gene expression is induced as fibroblasts begin to exit the proliferative cycle and enter quiescence, suggesting that this gene plays an important role in growth regulation. Two transcript variants encoding two different isoforms have been found for this gene. 5768 ENSG00000116260 QSOX1
endoplasmic reticulum oxidoreductase alpha NA 30001 ENSG00000197930 ERO1A
galectin 3 binding protein The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. LGALS3BP has been found elevated in the serum of patients with cancer and in those infected by the human immunodeficiency virus (HIV). It appears to be implicated in immune response associated with natural killer (NK) and lymphokine-activated killer (LAK) cell cytotoxicity. Using fluorescence in situ hybridization the full length 90K cDNA has been localized to chromosome 17q25. The native protein binds specifically to a human macrophage-associated lectin known as Mac-2 and also binds galectin 1. 3959 ENSG00000108679 LGALS3BP
GIPC PDZ domain containing family member 1 GIPC1 is a scaffolding protein that regulates cell surface receptor expression and trafficking (Lee et al., 2008 [PubMed 18775991]). 10755 ENSG00000123159 GIPC1
keratin 5 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the basal layer of the epidermis with family member KRT14. Mutations in these genes have been associated with a complex of diseases termed epidermolysis bullosa simplex. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3852 ENSG00000186081 KRT5
prominin 2 This gene encodes a member of the prominin family of pentaspan membrane glycoproteins. The encoded protein localizes to basal epithelial cells and may be involved in the organization of plasma membrane microdomains. Alternative splicing results in multiple transcript variants. 150696 ENSG00000155066 PROM2
carboxylesterase 2 This gene encodes a member of the carboxylesterase large family. The family members are responsible for the hydrolysis or transesterification of various xenobiotics, such as cocaine and heroin, and endogenous substrates with ester, thioester, or amide bonds. They may participate in fatty acyl and cholesterol ester metabolism, and may play a role in the blood-brain barrier system. The protein encoded by this gene is the major intestinal enzyme and functions in intestine drug clearance. Alternatively spliced transcript variants have been found for this gene. 8824 ENSG00000172831 CES2
ATP binding cassette subfamily C member 5 The protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the MRP subfamily which is involved in multi-drug resistance. This protein functions in the cellular export of its substrate, cyclic nucleotides. This export contributes to the degradation of phosphodiesterases and possibly an elimination pathway for cyclic nucleotides. Studies show that this protein provides resistance to thiopurine anticancer drugs, 6-mercatopurine and thioguanine, and the anti-HIV drug 9-(2-phosphonylmethoxyethyl)adenine. This protein may be involved in resistance to thiopurines in acute lymphoblastic leukemia and antiretroviral nucleoside analogs in HIV-infected patients. Alternative splicing results in multiple transcript variants. 10057 ENSG00000114770 ABCC5
carcinoembryonic antigen related cell adhesion molecule 1 This gene encodes a member of the carcinoembryonic antigen (CEA) gene family, which belongs to the immunoglobulin superfamily. Two subgroups of the CEA family, the CEA cell adhesion molecules and the pregnancy-specific glycoproteins, are located within a 1.2 Mb cluster on the long arm of chromosome 19. Eleven pseudogenes of the CEA cell adhesion molecule subgroup are also found in the cluster. The encoded protein was originally described in bile ducts of liver as biliary glycoprotein. Subsequently, it was found to be a cell-cell adhesion molecule detected on leukocytes, epithelia, and endothelia. The encoded protein mediates cell adhesion via homophilic as well as heterophilic binding to other proteins of the subgroup. Multiple cellular activities have been attributed to the encoded protein, including roles in the differentiation and arrangement of tissue three-dimensional structure, angiogenesis, apoptosis, tumor suppression, metastasis, and the modulation of innate and adaptive immune responses. Multiple transcript variants encoding different isoforms have been reported, but the full-length nature of all variants has not been defined. 634 ENSG00000079385 CEACAM1
dermcidin This antimicrobial gene encodes a secreted protein that is subsequently processed into mature peptides of distinct biological activities. The C-terminal peptide is constitutively expressed in sweat and has antibacterial and antifungal activities. The N-terminal peptide, also known as diffusible survival evasion peptide, promotes neural cell survival under conditions of severe oxidative stress. A glycosylated form of the N-terminal peptide may be associated with cachexia (muscle wasting) in cancer patients. Alternative splicing results in multiple transcript variants encoding different isoforms. 117159 ENSG00000161634 DCD
NAD(P)H quinone dehydrogenase 1 This gene is a member of the NAD(P)H dehydrogenase (quinone) family and encodes a cytoplasmic 2-electron reductase. This FAD-binding protein forms homodimers and reduces quinones to hydroquinones. This protein’s enzymatic activity prevents the one electron reduction of quinones that results in the production of radical species. Mutations in this gene have been associated with tardive dyskinesia (TD), an increased risk of hematotoxicity after exposure to benzene, and susceptibility to various forms of cancer. Altered expression of this protein has been seen in many tumors and is also associated with Alzheimer’s disease (AD). Alternate transcriptional splice variants, encoding different isoforms, have been characterized. 1728 ENSG00000181019 NQO1
inter-alpha-trypsin inhibitor heavy chain 3 This gene encodes the heavy chain subunit of the pre-alpha-trypsin inhibitor complex. This complex may stabilize the extracellular matrix through its ability to bind hyaluronic acid. Polymorphisms of this gene may be associated with increased risk for schizophrenia and major depressive disorder. This gene is present in an inter-alpha-trypsin inhibitor family gene cluster on chromosome 3. 3699 ENSG00000162267 ITIH3
albumin Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. 213 ENSG00000163631 ALB
thioredoxin The protein encoded by this gene acts as a homodimer and is involved in many redox reactions. The encoded protein is active in the reversible S-nitrosylation of cysteines in certain proteins, which is part of the response to intracellular nitric oxide. This protein is found in the cytoplasm. Two transcript variants encoding different isoforms have been found for this gene. 7295 ENSG00000136810 TXN
keratin 16 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains and are clustered in a region of chromosome 17q12-q21. This keratin has been coexpressed with keratin 14 in a number of epithelial tissues, including esophagus, tongue, and hair follicles. Mutations in this gene are associated with type 1 pachyonychia congenita, non-epidermolytic palmoplantar keratoderma and unilateral palmoplantar verrucous nevus. 3868 ENSG00000186832 KRT16
RAB25, member RAS oncogene family The protein encoded by this gene is a member of the RAS superfamily of small GTPases. The encoded protein is involved in membrane trafficking and cell survival. This gene has been found to be a tumor suppressor and an oncogene, depending on the context. Two variants, one protein-coding and the other not, have been found for this gene. 57111 ENSG00000132698 RAB25
vimentin This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. 7431 ENSG00000026025 VIM
keratinocyte differentiation associated protein This gene encodes a protein which may function in the regulation of keratinocyte differentiation and maintenance of stratified epithelia. Multiple transcript variants encoding different isoforms have been found for this gene. 388533 ENSG00000188508 KRTDAP
calpain 2 The calpains, calcium-activated neutral proteases, are nonlysosomal, intracellular cysteine proteases. The mammalian calpains include ubiquitous, stomach-specific, and muscle-specific proteins. The ubiquitous enzymes consist of heterodimers with distinct large, catalytic subunits associated with a common small, regulatory subunit. This gene encodes the large subunit of the ubiquitous enzyme, calpain 2. Multiple heterogeneous transcriptional start sites in the 5’ UTR have been reported. Two transcript variants encoding different isoforms have been found for this gene. 824 ENSG00000162909 CAPN2
calpain 1 The calpains, calcium-activated neutral proteases, are nonlysosomal, intracellular cysteine proteases. The mammalian calpains include ubiquitous, stomach-specific, and muscle-specific proteins. The ubiquitous enzymes consist of heterodimers with distinct large, catalytic subunits associated with a common small, regulatory subunit. This gene encodes the large subunit of the ubiquitous enzyme, calpain 1. Several transcript variants encoding two different isoforms have been found for this gene. 823 ENSG00000014216 CAPN1
tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein zeta This gene product belongs to the 14-3-3 family of proteins which mediate signal transduction by binding to phosphoserine-containing proteins. This highly conserved protein family is found in both plants and mammals, and this protein is 99% identical to the mouse, rat and sheep orthologs. The encoded protein interacts with IRS1 protein, suggesting a role in regulating insulin sensitivity. Several transcript variants that differ in the 5’ UTR but that encode the same protein have been identified for this gene. 7534 ENSG00000164924 YWHAZ
cytochrome P450 family 4 subfamily F member 29, pseudogene NA 54055 ENSG00000228314 CYP4F29P
prostate stem cell antigen This gene encodes a glycosylphosphatidylinositol-anchored cell membrane glycoprotein. In addition to being highly expressed in the prostate it is also expressed in the bladder, placenta, colon, kidney, and stomach. This gene is up-regulated in a large proportion of prostate cancers and is also detected in cancers of the bladder and pancreas. This gene includes a polymorphism that results in an upstream start codon in some individuals; this polymorphism is thought to be associated with a risk for certain gastric and bladder cancers. Alternative splicing results in multiple transcript variants. 8000 ENSG00000167653 PSCA
tumor protein p53 inducible protein 3 The protein encoded by this gene is similar to oxidoreductases, which are enzymes involved in cellular responses to oxidative stresses and irradiation. This gene is induced by the tumor suppressor p53 and is thought to be involved in p53-mediated cell death. It contains a p53 consensus binding site in its promoter region and a downstream pentanucleotide microsatellite sequence. P53 has been shown to transcriptionally activate this gene by interacting with the downstream pentanucleotide microsatellite sequence. The microsatellite is polymorphic, with a varying number of pentanucleotide repeats directly correlated with the extent of transcriptional activation by p53. It has been suggested that the microsatellite polymorphism may be associated with differential susceptibility to cancer. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 9540 ENSG00000115129 TP53I3
claudin 1 Tight junctions represent one mode of cell-to-cell adhesion in epithelial or endothelial cell sheets, forming continuous seals around cells and serving as a physical barrier to prevent solutes and water from passing freely through the paracellular space. These junctions are comprised of sets of continuous networking strands in the outwardly facing cytoplasmic leaflet, with complementary grooves in the inwardly facing extracytoplasmic leaflet. The protein encoded by this gene, a member of the claudin family, is an integral membrane protein and a component of tight junction strands. Loss of function mutations result in neonatal ichthyosis-sclerosing cholangitis syndrome. 9076 ENSG00000163347 CLDN1
claudin 7 This gene encodes a member of the claudin family. Claudins are integral membrane proteins and components of tight junction strands. Tight junction strands serve as a physical barrier to prevent solutes and water from passing freely through the paracellular space between epithelial or endothelial cell sheets, and also play critical roles in maintaining cell polarity and signal transductions. Differential expression of this gene has been observed in different types of malignancies, including breast cancer, ovarian cancer, hepatocellular carcinomas, urinary tumors, prostate cancer, lung cancer, head and neck cancers, thyroid carcinomas, etc.. Alternatively spliced transcript variants encoding different isoforms have been found. 1366 ENSG00000181885 CLDN7
annexin A3 This gene encodes a member of the annexin family. Members of this calcium-dependent phospholipid-binding protein family play a role in the regulation of cellular growth and in signal transduction pathways. This protein functions in the inhibition of phopholipase A2 and cleavage of inositol 1,2-cyclic phosphate to form inositol 1-phosphate. This protein may also play a role in anti-coagulation. 306 ENSG00000138772 ANXA3
sterile alpha motif domain containing 9 This gene encodes a sterile alpha motif domain-containing protein. The encoded protein localizes to the cytoplasm and may play a role in regulating cell proliferation and apoptosis. Mutations in this gene are the cause of normophosphatemic familial tumoral calcinosis. Alternate splicing results in multiple transcript variants that encode the same protein. 54809 ENSG00000205413 SAMD9
Kruppel like factor 5 This gene encodes a member of the Kruppel-like factor subfamily of zinc finger proteins. The encoded protein is a transcriptional activator that binds directly to a specific recognition motif in the promoters of target genes. This protein acts downstream of multiple different signaling pathways and is regulated by post-translational modification. It may participate in both promoting and suppressing cell proliferation. Expression of this gene may be changed in a variety of different cancers and in cardiovascular disease. Alternative splicing results in multiple transcript variants. 688 ENSG00000102554 KLF5
V-set and immunoglobulin domain containing 10 like NA 147645 ENSG00000186806 VSIG10L
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",7,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 8 Annotations

out <- mygene::queryMany(gene_list[8,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
X_id summary name symbol query
72 Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. actin, gamma 2, smooth muscle, enteric ACTG2 ENSG00000163017
1832 This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants. desmoplakin DSP ENSG00000096696
64065 NA PERP, TP53 apoptosis effector PERP ENSG00000112378
3855 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the simple epithelia lining the cavities of the internal organs and in the gland ducts and blood vessels. The genes encoding the type II cytokeratins are clustered in a region of chromosome 12q12-q13. Alternative splicing may result in several transcript variants; however, not all variants have been fully described. keratin 7 KRT7 ENSG00000135480
4625 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. myosin, heavy chain 7, cardiac muscle, beta MYH7 ENSG00000092054
84525 The protein encoded by this gene is a homeodomain protein that lacks certain conserved residues required for DNA binding. It was reported that choriocarcinoma cell lines and tissues failed to express this gene, which suggested the possible involvement of this gene in malignant conversion of placental trophoblasts. Studies in mice suggest that this protein may interact with serum response factor (SRF) and modulate SRF-dependent cardiac-specific gene expression and cardiac development. Multiple alternatively spliced transcript variants have been identified for this gene. HOP homeobox HOPX ENSG00000171476
125 The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. alcohol dehydrogenase 1B (class I), beta polypeptide ADH1B ENSG00000196616
1465 This gene encodes a member of the cysteine-rich protein (CSRP) family. This gene family includes a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this gene product occurs in proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Alternatively spliced transcript variants have been described. cysteine and glycine rich protein 1 CSRP1 ENSG00000159176
59 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. actin, alpha 2, smooth muscle, aorta ACTA2 ENSG00000107796
23650 The protein encoded by this gene belongs to the TRIM protein family. It has multiple zinc finger motifs and a leucine zipper motif. It has been proposed to form homo- or heterodimers which are involved in nucleic acid binding. Thus, it may act as a transcriptional regulatory factor involved in carcinogenesis and/or differentiation. It may also function in the suppression of radiosensitivity since it is associated with ataxia telangiectasia phenotype. tripartite motif containing 29 TRIM29 ENSG00000137699
11187 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This protein may act in cellular desmosome-dependent adhesion and signaling pathways. Two transcript variants encoding different isoforms have been found for this gene. plakophilin 3 PKP3 ENSG00000184363
3860 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. keratin 13 KRT13 ENSG00000171401
53905 The protein encoded by this gene is a glycoprotein and a member of the NADPH oxidase family. The synthesis of thyroid hormone is catalyzed by a protein complex located at the apical membrane of thyroid follicular cells. This complex contains an iodide transporter, thyroperoxidase, and a peroxide generating system that includes proteins encoded by this gene and the similar DUOX2 gene. This protein is known as dual oxidase because it has both a peroxidase homology domain and a gp91phox domain. This protein generates hydrogen peroxide and thereby plays a role in the activity of thyroid peroxidase, lactoperoxidase, and in lactoperoxidase-mediated antimicrobial defense at mucosal surfaces. Two alternatively spliced transcript variants encoding the same protein have been described for this gene. dual oxidase 1 DUOX1 ENSG00000137857
960 The protein encoded by this gene is a cell-surface glycoprotein involved in cell-cell interactions, cell adhesion and migration. It is a receptor for hyaluronic acid (HA) and can also interact with other ligands, such as osteopontin, collagens, and matrix metalloproteinases (MMPs). This protein participates in a wide variety of cellular functions including lymphocyte activation, recirculation and homing, hematopoiesis, and tumor metastasis. Transcripts for this gene undergo complex alternative splicing that results in many functionally distinct isoforms, however, the full length nature of some of these variants has not been determined. Alternative splicing is the basis for the structural and functional diversity of this protein, and may be related to tumor metastasis. CD44 molecule (Indian blood group) CD44 ENSG00000026508
54869 This gene encodes a protein that is related to epidermal growth factor receptor pathway substrate 8 (EPS8), a substrate for the epidermal growth factor receptor. The function of this protein is unknown. At least two alternatively spliced transcript variants encoding different isoforms have been found for this gene. EPS8 like 1 EPS8L1 ENSG00000131037
6288 This gene encodes a member of the serum amyloid A family of apolipoproteins. The encoded preproprotein is proteolytically processed to generate the mature protein. This protein is a major acute phase protein that is highly expressed in response to inflammation and tissue injury. This protein also plays an important role in HDL metabolism and cholesterol homeostasis. High levels of this protein are associated with chronic inflammatory diseases including atherosclerosis, rheumatoid arthritis, Alzheimer’s disease and Crohn’s disease. This protein may also be a potential biomarker for certain tumors. Alternate splicing results in multiple transcript variants that encode the same protein. A pseudogene of this gene is found on chromosome 11. serum amyloid A1 SAA1 ENSG00000173432
5317 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This protein may be involved in molecular recruitment and stabilization during desmosome formation. Mutations in this gene have been associated with the ectodermal dysplasia/skin fragility syndrome. Two transcript variants encoding different isoforms have been found for this gene. plakophilin 1 PKP1 ENSG00000081277
2752 The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. glutamate-ammonia ligase GLUL ENSG00000135821
9289 This gene encodes a member of the G protein-coupled receptor family and regulates brain cortical patterning. The encoded protein binds specifically to transglutaminase 2, a component of tissue and tumor stroma implicated as an inhibitor of tumor progression. Mutations in this gene are associated with a brain malformation known as bilateral frontoparietal polymicrogyria. Alternative splicing results in multiple transcript variants. adhesion G protein-coupled receptor G1 ADGRG1 ENSG00000205336
6319 This gene encodes an enzyme involved in fatty acid biosynthesis, primarily the synthesis of oleic acid. The protein belongs to the fatty acid desaturase family and is an integral membrane protein located in the endoplasmic reticulum. Transcripts of approximately 3.9 and 5.2 kb, differing only by alternative polyadenlyation signals, have been detected. A gene encoding a similar enzyme is located on chromosome 4 and a pseudogene of this gene is located on chromosome 17. stearoyl-CoA desaturase SCD ENSG00000099194
25946 Zinc finger proteins, such as ZNF385A, are regulatory proteins that act as transcription factors, bind single- or double-stranded RNA, or interact with other proteins (Sharma et al., 2004 [PubMed 15527981]). zinc finger protein 385A ZNF385A ENSG00000161642
5265 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. serpin family A member 1 SERPINA1 ENSG00000197249
374897 NA suprabasin SBSN ENSG00000189001
93099 This gene is upregulated in inflammatory diseases, and it was first observed as expressed in the differentiated layers of skin. The most interesting aspect of this gene is the differential use of promoters and terminators to generate isoforms with unique cellular distributions and domain components. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. dermokine DMKN ENSG00000161249
57111 The protein encoded by this gene is a member of the RAS superfamily of small GTPases. The encoded protein is involved in membrane trafficking and cell survival. This gene has been found to be a tumor suppressor and an oncogene, depending on the context. Two variants, one protein-coding and the other not, have been found for this gene. RAB25, member RAS oncogene family RAB25 ENSG00000132698
7038 Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. thyroglobulin TG ENSG00000042832
171024 NA synaptopodin 2 SYNPO2 ENSG00000172403
2261 This gene encodes a member of the fibroblast growth factor receptor (FGFR) family, with its amino acid sequence being highly conserved between members and among divergent species. FGFR family members differ from one another in their ligand affinities and tissue distribution. A full-length representative protein would consist of an extracellular region, composed of three immunoglobulin-like domains, a single hydrophobic membrane-spanning segment and a cytoplasmic tyrosine kinase domain. The extracellular portion of the protein interacts with fibroblast growth factors, setting in motion a cascade of downstream signals, ultimately influencing mitogenesis and differentiation. This particular family member binds acidic and basic fibroblast growth hormone and plays a role in bone development and maintenance. Mutations in this gene lead to craniosynostosis and multiple types of skeletal dysplasia. Three alternatively spliced transcript variants that encode different protein isoforms have been described. fibroblast growth factor receptor 3 FGFR3 ENSG00000068078
5617 This gene encodes the anterior pituitary hormone prolactin. This secreted hormone is a growth regulator for many tissues, including cells of the immune system. It may also play a role in cell survival by suppressing apoptosis, and it is essential for lactation. Alternative splicing results in multiple transcript variants that encode the same protein. prolactin PRL ENSG00000172179
1158 The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. creatine kinase, M-type CKM ENSG00000104879
1843 The expression of DUSP1 gene is induced in human skin fibroblasts by oxidative/heat stress and growth factors. It specifies a protein with structural features similar to members of the non-receptor-type protein-tyrosine phosphatase family, and which has significant amino-acid sequence similarity to a Tyr/Ser-protein phosphatase encoded by the late gene H1 of vaccinia virus. The bacterially expressed and purified DUSP1 protein has intrinsic phosphatase activity, and specifically inactivates mitogen-activated protein (MAP) kinase in vitro by the concomitant dephosphorylation of both its phosphothreonine and phosphotyrosine residues. Furthermore, it suppresses the activation of MAP kinase by oncogenic ras in extracts of Xenopus oocytes. Thus, DUSP1 may play an important role in the human cellular response to environmental stress as well as in the negative regulation of cellular proliferation. dual specificity phosphatase 1 DUSP1 ENSG00000120129
4070 This intronless gene encodes a carcinoma-associated antigen. This antigen is a cell surface receptor that transduces calcium signals. Mutations of this gene have been associated with gelatinous drop-like corneal dystrophy. tumor-associated calcium signal transducer 2 TACSTD2 ENSG00000184292
2597 This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. glyceraldehyde-3-phosphate dehydrogenase GAPDH ENSG00000111640
213 Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. albumin ALB ENSG00000163631
3309 The protein encoded by this gene is a member of the heat shock protein 70 (HSP70) family. It is localized in the lumen of the endoplasmic reticulum (ER), and is involved in the folding and assembly of proteins in the ER. As this protein interacts with many ER proteins, it may play a key role in monitoring protein transport through the cell. heat shock protein family A (Hsp70) member 5 HSPA5 ENSG00000044574
1401 The protein encoded by this gene belongs to the pentaxin family. It is involved in several host defense related functions based on its ability to recognize foreign pathogens and damaged cells of the host and to initiate their elimination by interacting with humoral and cellular effector systems in the blood. Consequently, the level of this protein in plasma increases greatly during acute phase response to tissue injury, infection, or other inflammatory stimuli. C-reactive protein, pentraxin-related CRP ENSG00000132693
7169 This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. tropomyosin 2 (beta) TPM2 ENSG00000198467
2243 This gene encodes the alpha subunit of the coagulation factor fibrinogen, which is a component of the blood clot. Following vascular injury, the encoded preproprotein is proteolytically processed by thrombin during the conversion of fibrinogen to fibrin. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia, afibrinogenemia and renal amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. fibrinogen alpha chain FGA ENSG00000171560
3880 The protein encoded by this gene is a member of the keratin family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. The type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. Unlike its related family members, this smallest known acidic cytokeratin is not paired with a basic cytokeratin in epithelial cells. It is specifically expressed in the periderm, the transiently superficial layer that envelopes the developing epidermis. The type I cytokeratins are clustered in a region of chromosome 17q12-q21. keratin 19 KRT19 ENSG00000171345
7094 This gene encodes a cytoskeletal protein that is concentrated in areas of cell-substratum and cell-cell contacts. The encoded protein plays a significant role in the assembly of actin filaments and in spreading and migration of various cell types, including fibroblasts and osteoclasts. It codistributes with integrins in the cell surface membrane in order to assist in the attachment of adherent cells to extracellular matrices and of lymphocytes to other cells. The N-terminus of this protein contains elements for localization to cell-extracellular matrix junctions. The C-terminus contains binding sites for proteins such as beta-1-integrin, actin, and vinculin. talin 1 TLN1 ENSG00000137076
3848 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 1 KRT1 ENSG00000167768
2697 This gene is a member of the connexin gene family. The encoded protein is a component of gap junctions, which are composed of arrays of intercellular channels that provide a route for the diffusion of low molecular weight materials from cell to cell. The encoded protein is the major protein of gap junctions in the heart that are thought to have a crucial role in the synchronized contraction of the heart and in embryonic development. A related intronless pseudogene has been mapped to chromosome 5. Mutations in this gene have been associated with oculodentodigital dysplasia, autosomal recessive craniometaphyseal dysplasia and heart malformations. gap junction protein alpha 1 GJA1 ENSG00000152661
83959 This gene encodes a voltage-regulated, electrogenic sodium-coupled borate cotransporter that is essential for borate homeostasis, cell growth and cell proliferation. Mutations in this gene have been associated with a number of endothelial corneal dystrophies including recessive corneal endothelial dystrophy 2, corneal dystrophy and perceptive deafness, and Fuchs endothelial corneal dystrophy. Multiple transcript variants encoding different isoforms have been described. solute carrier family 4 member 11 SLC4A11 ENSG00000088836
3960 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. The expression of this gene is restricted to small intestine, colon, and rectum, and it is underexpressed in colorectal cancer. galectin 4 LGALS4 ENSG00000171747
4359 This gene is specifically expressed in Schwann cells of the peripheral nervous system and encodes a type I transmembrane glycoprotein that is a major structural protein of the peripheral myelin sheath. The encoded protein contains a large hydrophobic extracellular domain and a smaller basic intracellular domain, which are essential for the formation and stabilization of the multilamellar structure of the compact myelin. Mutations in this gene are associated with autosomal dominant form of Charcot-Marie-Tooth disease type 1 (CMT1B) and other polyneuropathies, such as Dejerine-Sottas syndrome (DSS) and congenital hypomyelinating neuropathy (CHN). A recent study showed that two isoforms are produced from the same mRNA by use of alternative in-frame translation termination codons via a stop codon readthrough mechanism. myelin protein zero MPZ ENSG00000158887
476 The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The catalytic subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes an alpha 1 subunit. Multiple transcript variants encoding different isoforms have been found for this gene. ATPase Na+/K+ transporting subunit alpha 1 ATP1A1 ENSG00000163399
7056 The protein encoded by this intronless gene is an endothelial-specific type I membrane receptor that binds thrombin. This binding results in the activation of protein C, which degrades clotting factors Va and VIIIa and reduces the amount of thrombin generated. Mutations in this gene are a cause of thromboembolic disease, also known as inherited thrombophilia. thrombomodulin THBD ENSG00000178726
70 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). actin, alpha, cardiac muscle 1 ACTC1 ENSG00000159251
388533 This gene encodes a protein which may function in the regulation of keratinocyte differentiation and maintenance of stratified epithelia. Multiple transcript variants encoding different isoforms have been found for this gene. keratinocyte differentiation associated protein KRTDAP ENSG00000188508
10653 This gene encodes a transmembrane protein with two extracellular Kunitz domains that inhibits a variety of serine proteases. The protein inhibits HGF activator which prevents the formation of active hepatocyte growth factor. This gene is a putative tumor suppressor, and mutations in this gene result in congenital sodium diarrhea. Multiple transcript variants encoding different isoforms have been found for this gene. serine peptidase inhibitor, Kunitz type, 2 SPINT2 ENSG00000167642
2335 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. fibronectin 1 FN1 ENSG00000115414
6285 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21; however, this gene is located at 21q22.3. This protein may function in Neurite extension, proliferation of melanoma cells, stimulation of Ca2+ fluxes, inhibition of PKC-mediated phosphorylation, astrocytosis and axonal proliferation, and inhibition of microtubule assembly. Chromosomal rearrangements and altered expression of this gene have been implicated in several neurological, neoplastic, and other types of diseases, including Alzheimer’s disease, Down’s syndrome, epilepsy, amyotrophic lateral sclerosis, melanoma, and type I diabetes. S100 calcium binding protein B S100B ENSG00000160307
8428 This gene encodes a serine/threonine protein kinase that functions upstream of mitogen-activated protein kinase (MAPK) signaling. The encoded protein is cleaved into two chains by caspases; the N-terminal fragment (MST3/N) translocates to the nucleus and promotes programmed cells death. There is a pseudogene for this gene on chromosome X. Alternative splicing results in multiple transcript variants. serine/threonine kinase 24 STK24 ENSG00000102572
ENSG00000225630 NA mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28 MTND2P28 ENSG00000225630
9314 This gene encodes a protein that belongs to the Kruppel family of transcription factors. The encoded zinc finger protein is required for normal development of the barrier function of skin. The encoded protein is thought to control the G1-to-S transition of the cell cycle following DNA damage by mediating the tumor suppressor gene p53. Mice lacking this gene have a normal appearance but lose weight rapidly, and die shortly after birth due to fluid evaporation resulting from compromised epidermal barrier function. Alternative splicing results in multiple transcript variants encoding different isoforms. Kruppel like factor 4 KLF4 ENSG00000136826
2688 The protein encoded by this gene is a member of the somatotropin/prolactin family of hormones which play an important role in growth control. The gene, along with four other related genes, is located at the growth hormone locus on chromosome 17 where they are interspersed in the same transcriptional orientation; an arrangement which is thought to have evolved by a series of gene duplications. The five genes share a remarkably high degree of sequence identity. Alternative splicing generates additional isoforms of each of the five growth hormones, leading to further diversity and potential for specialization. This particular family member is expressed in the pituitary but not in placental tissue as is the case for the other four genes in the growth hormone locus. Mutations in or deletions of the gene lead to growth hormone deficiency and short stature. growth hormone 1 GH1 ENSG00000259384
360 This gene encodes the water channel protein aquaporin 3. Aquaporins are a family of small integral membrane proteins related to the major intrinsic protein, also known as aquaporin 0. Aquaporin 3 is localized at the basal lateral membranes of collecting duct cells in the kidney. In addition to its water channel function, aquaporin 3 has been found to facilitate the transport of nonionic small solutes such as urea and glycerol, but to a smaller degree. It has been suggested that water channels can be functionally heterogeneous and possess water and solute permeation mechanisms. Alternative splicing of this gene results in multiple transcript variants encoding different isoforms. aquaporin 3 (Gill blood group) AQP3 ENSG00000165272
54739 This gene encodes a protein which binds to and counteracts the inhibitory effect of a member of the IAP (inhibitor of apoptosis) protein family. IAP proteins bind to and inhibit caspases which are activated during apoptosis. The proportion of IAPs and proteins which interfere with their activity, such as the encoded protein, affect the progress of the apoptosis signaling pathway. Multiple transcript variants encoding different isoforms have been found for this gene. XIAP associated factor 1 XAF1 ENSG00000132530
1191 The protein encoded by this gene is a secreted chaperone that can under some stress conditions also be found in the cell cytosol. It has been suggested to be involved in several basic biological events such as cell death, tumor progression, and neurodegenerative disorders. Alternate splicing results in both coding and non-coding variants. clusterin CLU ENSG00000120885
10135 This gene encodes a protein that catalyzes the condensation of nicotinamide with 5-phosphoribosyl-1-pyrophosphate to yield nicotinamide mononucleotide, one step in the biosynthesis of nicotinamide adenine dinucleotide. The protein belongs to the nicotinic acid phosphoribosyltransferase (NAPRTase) family and is thought to be involved in many important biological processes, including metabolism, stress response and aging. This gene has a pseudogene on chromosome 10. nicotinamide phosphoribosyltransferase NAMPT ENSG00000105835
220323 NA out at first homolog OAF ENSG00000184232
50649 Rho GTPases play a fundamental role in numerous cellular processes that are initiated by extracellular stimuli that work through G protein coupled receptors. The protein encoded by this gene may form complex with G proteins and stimulate Rho-dependent signals. Multiple alternatively spliced transcript variants encoding different isoforms have been found, but the full-length nature of some variants has not been determined. Rho guanine nucleotide exchange factor 4 ARHGEF4 ENSG00000136002
2266 The protein encoded by this gene is the gamma component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia and thrombophilia. Alternative splicing results in transcript variants encoding different isoforms. fibrinogen gamma chain FGG ENSG00000171557
5919 This gene encodes a secreted chemotactic protein that initiates chemotaxis via the ChemR23 G protein-coupled seven-transmembrane domain ligand. Expression of this gene is upregulated by the synthetic retinoid tazarotene and occurs in a wide variety of tissues. The active protein has several roles, including that as an adipokine and as an antimicrobial protein with activity against bacteria and fungi. retinoic acid receptor responder 2 RARRES2 ENSG00000106538
7534 This gene product belongs to the 14-3-3 family of proteins which mediate signal transduction by binding to phosphoserine-containing proteins. This highly conserved protein family is found in both plants and mammals, and this protein is 99% identical to the mouse, rat and sheep orthologs. The encoded protein interacts with IRS1 protein, suggesting a role in regulating insulin sensitivity. Several transcript variants that differ in the 5’ UTR but that encode the same protein have been identified for this gene. tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein zeta YWHAZ ENSG00000164924
1292 This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. collagen type VI alpha 2 COL6A2 ENSG00000142173
134147 CMBL (EC 3.1.1.45) is a cysteine hydrolase of the dienelactone hydrolase family that is highly expressed in liver cytosol. CMBL preferentially cleaves cyclic esters, and it activates medoxomil-ester prodrugs in which the medoxomil moiety is linked to an oxygen atom (Ishizuka et al., 2010 [PubMed 20177059]). carboxymethylenebutenolidase homolog (Pseudomonas) CMBL ENSG00000164237
5004 This gene encodes a key acute phase plasma protein. Because of its increase due to acute inflammation, this protein is classified as an acute-phase reactant. The specific function of this protein has not yet been determined; however, it may be involved in aspects of immunosuppression. orosomucoid 1 ORM1 ENSG00000229314
57402 This gene encodes a member of the S100 protein family which contains an EF-hand motif and binds calcium. The gene is located in a cluster of S100 genes on chromosome 1. Levels of the encoded protein have been found to be lower in cancerous tissue and associated with metastasis suggesting a tumor suppressor function (PMID: 19956863, 19351828). S100 calcium binding protein A14 S100A14 ENSG00000189334
488 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol into the sarcoplasmic reticulum lumen, and is involved in regulation of the contraction/relaxation cycle. Mutations in this gene cause Darier-White disease, also known as keratosis follicularis, an autosomal dominant skin disorder characterized by loss of adhesion between epidermal cells and abnormal keratinization. Alternative splicing results in multiple transcript variants encoding different isoforms. ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 2 ATP2A2 ENSG00000174437
ENSG00000180139 NA ACTA2 antisense RNA 1 ACTA2-AS1 ENSG00000180139
3512 NA joining chain of multimeric IgA and IgM JCHAIN ENSG00000132465
9620 The protein encoded by this gene is a member of the flamingo subfamily, part of the cadherin superfamily. The flamingo subfamily consists of nonclassic-type cadherins; a subpopulation that does not interact with catenins. The flamingo cadherins are located at the plasma membrane and have nine cadherin domains, seven epidermal growth factor-like repeats and two laminin A G-type repeats in their ectodomain. They also have seven transmembrane domains, a characteristic unique to this subfamily. It is postulated that these proteins are receptors involved in contact-mediated communication, with cadherin domains acting as homophilic binding regions and the EGF-like domains involved in cell adhesion and receptor-ligand interactions. This particular member is a developmentally regulated, neural-specific gene which plays an unspecified role in early embryogenesis. cadherin EGF LAG seven-pass G-type receptor 1 CELSR1 ENSG00000075275
10057 The protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the MRP subfamily which is involved in multi-drug resistance. This protein functions in the cellular export of its substrate, cyclic nucleotides. This export contributes to the degradation of phosphodiesterases and possibly an elimination pathway for cyclic nucleotides. Studies show that this protein provides resistance to thiopurine anticancer drugs, 6-mercatopurine and thioguanine, and the anti-HIV drug 9-(2-phosphonylmethoxyethyl)adenine. This protein may be involved in resistance to thiopurines in acute lymphoblastic leukemia and antiretroviral nucleoside analogs in HIV-infected patients. Alternative splicing results in multiple transcript variants. ATP binding cassette subfamily C member 5 ABCC5 ENSG00000114770
5950 This protein belongs to the lipocalin family and is the specific carrier for retinol (vitamin A alcohol) in the blood. It delivers retinol from the liver stores to the peripheral tissues. In plasma, the RBP-retinol complex interacts with transthyretin which prevents its loss by filtration through the kidney glomeruli. A deficiency of vitamin A blocks secretion of the binding protein posttranslationally and results in defective delivery and supply to the epidermal cells. retinol binding protein 4 RBP4 ENSG00000138207
149428 The protein encoded by this gene interacts with several other proteins, such as BCL2, ARHGAP1, MIF and GFER. It may function as a bridge molecule between BCL2 and ARHGAP1/CDC42 in promoting cell death. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. BCL2/adenovirus E1B 19kD interacting protein like BNIPL ENSG00000163141
2244 The protein encoded by this gene is the beta component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including afibrinogenemia, dysfibrinogenemia, hypodysfibrinogenemia and thrombotic tendency. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. fibrinogen beta chain FGB ENSG00000171564
1360 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. carboxypeptidase B1 CPB1 ENSG00000153002
25900 This gene is a member of the intermediate filament family. Intermediate filaments are proteins which are primordial components of the cytoskeleton and nuclear envelope. The proteins encoded by the members of this gene family are evolutionarily and structurally related but have limited sequence homology, with the exception of the central rod domain. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. intermediate filament family orphan 1 IFFO1 ENSG00000010295
23344 NA extended synaptotagmin protein 1 ESYT1 ENSG00000139641
58498 NA myosin light chain 7 MYL7 ENSG00000106631
29842 NA transcription factor CP2-like 1 TFCP2L1 ENSG00000115112
ENSG00000261054 NA NA RP11-6O2.4 ENSG00000261054
2052 Epoxide hydrolase is a critical biotransformation enzyme that converts epoxides from the degradation of aromatic compounds to trans-dihydrodiols which can be conjugated and excreted from the body. Epoxide hydrolase functions in both the activation and detoxification of epoxides. Mutations in this gene cause preeclampsia, epoxide hydrolase deficiency or increased epoxide hydrolase activity. Alternatively spliced transcript variants encoding the same protein have been found for this gene. epoxide hydrolase 1 EPHX1 ENSG00000143819
4629 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. myosin, heavy chain 11, smooth muscle MYH11 ENSG00000133392
ENSG00000229732 NA NA AC019349.5 ENSG00000229732
8766 The protein encoded by this gene belongs to the Rab family of the small GTPase superfamily. It is associated with both constitutive and regulated secretory pathways, and may be involved in protein transport. Two transcript variants encoding different isoforms have been found for this gene. RAB11A, member RAS oncogene family RAB11A ENSG00000103769
84649 This gene encodes one of two enzymes which catalyzes the final reaction in the synthesis of triglycerides in which diacylglycerol is covalently bound to long chain fatty acyl-CoAs. The encoded protein catalyzes this reaction at low concentrations of magnesium chloride while the other enzyme has high activity at high concentrations of magnesium chloride. Multiple transcript variants encoding different isoforms have been found for this gene. diacylglycerol O-acyltransferase 2 DGAT2 ENSG00000062282
229 Fructose-1,6-bisphosphate aldolase (EC 4.1.2.13) is a tetrameric glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Vertebrates have 3 aldolase isozymes which are distinguished by their electrophoretic and catalytic properties. Differences indicate that aldolases A, B, and C are distinct proteins, the products of a family of related ‘housekeeping’ genes exhibiting developmentally regulated expression of the different isozymes. The developing embryo produces aldolase A, which is produced in even greater amounts in adult muscle where it can be as much as 5% of total cellular protein. In adult liver, kidney and intestine, aldolase A expression is repressed and aldolase B is produced. In brain and other nervous tissue, aldolase A and C are expressed about equally. There is a high degree of homology between aldolase A and C. Defects in ALDOB cause hereditary fructose intolerance. aldolase, fructose-bisphosphate B ALDOB ENSG00000136872
7448 The protein encoded by this gene is a member of the pexin family. It is found in serum and tissues and promotes cell adhesion and spreading, inhibits the membrane-damaging effect of the terminal cytolytic complement pathway, and binds to several serpin serine protease inhibitors. It is a secreted protein and exists in either a single chain form or a clipped, two chain form held together by a disulfide bond. vitronectin VTN ENSG00000109072
100528017 This locus represents naturally occurring read-through transcription between the neighboring serum amyloid A2 and serum amyloid A4 genes on chromosome 11. The read-through transcript produces a fusion protein that shares sequence identity with each individual gene product. SAA2-SAA4 readthrough SAA2-SAA4 ENSG00000255071
5208 The protein encoded by this gene is involved in both the synthesis and degradation of fructose-2,6-bisphosphate, a regulatory molecule that controls glycolysis in eukaryotes. The encoded protein has a 6-phosphofructo-2-kinase activity that catalyzes the synthesis of fructose-2,6-bisphosphate, and a fructose-2,6-biphosphatase activity that catalyzes the degradation of fructose-2,6-bisphosphate. This protein regulates fructose-2,6-bisphosphate levels in the heart, while a related enzyme encoded by a different gene regulates fructose-2,6-bisphosphate levels in the liver and muscle. This enzyme functions as a homodimer. Two transcript variants encoding two different isoforms have been found for this gene. 6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 2 PFKFB2 ENSG00000123836
1264 NA calponin 1 CNN1 ENSG00000130176
1281 This gene encodes the pro-alpha1 chains of type III collagen, a fibrillar collagen that is found in extensible connective tissues such as skin, lung, uterus, intestine and the vascular system, frequently in association with type I collagen. Mutations in this gene are associated with Ehlers-Danlos syndrome types IV, and with aortic and arterial aneurysms. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. collagen type III alpha 1 chain COL3A1 ENSG00000168542
23022 This gene encodes a cytoskeletal protein that is required for organizing the actin cytoskeleton. The protein is a component of actin-containing microfilaments, and it is involved in the control of cell shape, adhesion, and contraction. Polymorphisms in this gene are associated with a susceptibility to pancreatic cancer type 1, and also with a risk for myocardial infarction. Alternative splicing results in multiple transcript variants. palladin, cytoskeletal associated protein PALLD ENSG00000129116
7538 NA ZFP36 ring finger protein ZFP36 ENSG00000128016
3849 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 2 KRT2 ENSG00000172867
5187 This gene is a member of the Period family of genes and is expressed in a circadian pattern in the suprachiasmatic nucleus, the primary circadian pacemaker in the mammalian brain. Genes in this family encode components of the circadian rhythms of locomotor activity, metabolism, and behavior. This gene is upregulated by CLOCK/ARNTL heterodimers but then represses this upregulation in a feedback loop using PER/CRY heterodimers to interact with CLOCK/ARNTL. Polymorphisms in this gene may increase the risk of getting certain cancers. Alternative splicing has been observed in this gene; however, these variants have not been fully described. period circadian clock 1 PER1 ENSG00000179094
7173 This gene encodes a membrane-bound glycoprotein. The encoded protein acts as an enzyme and plays a central role in thyroid gland function. The protein functions in the iodination of tyrosine residues in thyroglobulin and phenoxy-ester formation between pairs of iodinated tyrosines to generate the thyroid hormones, thyroxine and triiodothyronine. Mutations in this gene are associated with several disorders of thyroid hormonogenesis, including congenital hypothyroidism, congenital goiter, and thyroid hormone organification defect IIA. Multiple transcript variants encoding distinct isoforms have been identified for this gene, but the full-length nature of some variants has not been determined. thyroid peroxidase TPO ENSG00000115705
4311 This gene encodes a common acute lymphocytic leukemia antigen that is an important cell surface marker in the diagnosis of human acute lymphocytic leukemia (ALL). This protein is present on leukemic cells of pre-B phenotype, which represent 85% of cases of ALL. This protein is not restricted to leukemic cells, however, and is found on a variety of normal tissues. It is a glycoprotein that is particularly abundant in kidney, where it is present on the brush border of proximal tubules and on glomerular epithelium. The protein is a neutral endopeptidase that cleaves peptides at the amino side of hydrophobic residues and inactivates several peptide hormones including glucagon, enkephalins, substance P, neurotensin, oxytocin, and bradykinin. This gene, which encodes a 100-kD type II transmembrane glycoprotein, exists in a single copy of greater than 45 kb. The 5’ untranslated region of this gene is alternatively spliced, resulting in four separate mRNA transcripts. The coding region is not affected by alternative splicing. membrane metallo-endopeptidase MME ENSG00000196549
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",8,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 9 Annotations

out <- mygene::queryMany(gene_list[9,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id name summary symbol query notfound
1674 desmin This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. DES ENSG00000175084 NA
4629 myosin, heavy chain 11, smooth muscle The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. MYH11 ENSG00000133392 NA
2335 fibronectin 1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. FN1 ENSG00000115414 NA
213 albumin Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. ALB ENSG00000163631 NA
60 actin, beta This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. ACTB ENSG00000075624 NA
10398 myosin light chain 9 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. MYL9 ENSG00000101335 NA
5265 serpin family A member 1 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. SERPINA1 ENSG00000197249 NA
2934 gelsolin The protein encoded by this gene binds to the ‘plus’ ends of actin monomers and filaments to prevent monomer exchange. The encoded calcium-regulated protein functions in both assembly and disassembly of actin filaments. Defects in this gene are a cause of familial amyloidosis Finnish type (FAF). Multiple transcript variants encoding several different isoforms have been found for this gene. GSN ENSG00000148180 NA
2243 fibrinogen alpha chain This gene encodes the alpha subunit of the coagulation factor fibrinogen, which is a component of the blood clot. Following vascular injury, the encoded preproprotein is proteolytically processed by thrombin during the conversion of fibrinogen to fibrin. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia, afibrinogenemia and renal amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. FGA ENSG00000171560 NA
2244 fibrinogen beta chain The protein encoded by this gene is the beta component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including afibrinogenemia, dysfibrinogenemia, hypodysfibrinogenemia and thrombotic tendency. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. FGB ENSG00000171564 NA
7431 vimentin This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. VIM ENSG00000026025 NA
NA NA NA NA ENSG00000259716 TRUE
1465 cysteine and glycine rich protein 1 This gene encodes a member of the cysteine-rich protein (CSRP) family. This gene family includes a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this gene product occurs in proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Alternatively spliced transcript variants have been described. CSRP1 ENSG00000159176 NA
4151 myoglobin This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. MB ENSG00000198125 NA
4625 myosin, heavy chain 7, cardiac muscle, beta Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. MYH7 ENSG00000092054 NA
2266 fibrinogen gamma chain The protein encoded by this gene is the gamma component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia and thrombophilia. Alternative splicing results in transcript variants encoding different isoforms. FGG ENSG00000171557 NA
5004 orosomucoid 1 This gene encodes a key acute phase plasma protein. Because of its increase due to acute inflammation, this protein is classified as an acute-phase reactant. The specific function of this protein has not yet been determined; however, it may be involved in aspects of immunosuppression. ORM1 ENSG00000229314 NA
4638 myosin light chain kinase This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. MYLK ENSG00000065534 NA
7038 thyroglobulin Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. TG ENSG00000042832 NA
4633 myosin light chain 2 Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. MYL2 ENSG00000111245 NA
2199 fibulin 2 This gene encodes an extracellular matrix protein, which belongs to the fibulin family. This protein binds various extracellular ligands and calcium. It may play a role during organ development, in particular, during the differentiation of heart, skeletal and neuronal structures. Alternatively spliced transcript variants encoding different isoforms have been identified. FBLN2 ENSG00000163520 NA
1401 C-reactive protein, pentraxin-related The protein encoded by this gene belongs to the pentaxin family. It is involved in several host defense related functions based on its ability to recognize foreign pathogens and damaged cells of the host and to initiate their elimination by interacting with humoral and cellular effector systems in the blood. Consequently, the level of this protein in plasma increases greatly during acute phase response to tissue injury, infection, or other inflammatory stimuli. CRP ENSG00000132693 NA
151887 coiled-coil domain containing 80 NA CCDC80 ENSG00000091986 NA
4023 lipoprotein lipase LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. LPL ENSG00000175445 NA
87 actinin alpha 1 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, cytoskeletal, alpha actinin isoform and maps to the same site as the structurally similar erythroid beta spectrin gene. Three transcript variants encoding different isoforms have been found for this gene. ACTN1 ENSG00000072110 NA
302 annexin A2 This gene encodes a member of the annexin family. Members of this calcium-dependent phospholipid-binding protein family play a role in the regulation of cellular growth and in signal transduction pathways. This protein functions as an autocrine factor which heightens osteoclast formation and bone resorption. This gene has three pseudogenes located on chromosomes 4, 9 and 10, respectively. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. ANXA2 ENSG00000182718 NA
7145 tensin 1 The protein encoded by this gene localizes to focal adhesions, regions of the plasma membrane where the cell attaches to the extracellular matrix. This protein crosslinks actin filaments and contains a Src homology 2 (SH2) domain, which is often found in molecules involved in signal transduction. This protein is a substrate of calpain II. Alternative splicing results in multiple transcript variants encoding different isoforms. TNS1 ENSG00000079308 NA
1571 cytochrome P450 family 2 subfamily E member 1 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and is induced by ethanol, the diabetic state, and starvation. The enzyme metabolizes both endogenous substrates, such as ethanol, acetone, and acetal, as well as exogenous substrates including benzene, carbon tetrachloride, ethylene glycol, and nitrosamines which are premutagens found in cigarette smoke. Due to its many substrates, this enzyme may be involved in such varied processes as gluconeogenesis, hepatic cirrhosis, diabetes, and cancer. CYP2E1 ENSG00000130649 NA
27063 ankyrin repeat domain 1 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. ANKRD1 ENSG00000148677 NA
9445 integral membrane protein 2B Amyloid precursor proteins are processed by beta-secretase and gamma-secretase to produce beta-amyloid peptides which form the characteristic plaques of Alzheimer disease. This gene encodes a transmembrane protein which is processed at the C-terminus by furin or furin-like proteases to produce a small secreted peptide which inhibits the deposition of beta-amyloid. Mutations which result in extension of the C-terminal end of the encoded protein, thereby increasing the size of the secreted peptide, are associated with two neurogenerative diseases, familial British dementia and familial Danish dementia. ITM2B ENSG00000136156 NA
345 apolipoprotein C3 Apolipoprotein C-III is a very low density lipoprotein (VLDL) protein. APOC3 inhibits lipoprotein lipase and hepatic lipase; it is thought to delay catabolism of triglyceride-rich particles. The APOA1, APOC3 and APOA4 genes are closely linked in both rat and human genomes. The A-I and A-IV genes are transcribed from the same strand, while the A-1 and C-III genes are convergently transcribed. An increase in apoC-III levels induces the development of hypertriglyceridemia. APOC3 ENSG00000110245 NA
1278 collagen type I alpha 2 chain This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. COL1A2 ENSG00000164692 NA
567 beta-2-microglobulin This gene encodes a serum protein found in association with the major histocompatibility complex (MHC) class I heavy chain on the surface of nearly all nucleated cells. The protein has a predominantly beta-pleated sheet structure that can form amyloid fibrils in some pathological conditions. The encoded antimicrobial protein displays antibacterial activity in amniotic fluid. A mutation in this gene has been shown to result in hypercatabolic hypoproteinemia. B2M ENSG00000166710 NA
4634 myosin light chain 3 MYL3 encodes myosin light chain 3, an alkali light chain also referred to in the literature as both the ventricular isoform and the slow skeletal muscle isoform. Mutations in MYL3 have been identified as a cause of mid-left ventricular chamber type hypertrophic cardiomyopathy. MYL3 ENSG00000160808 NA
1410 crystallin alpha B Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. CRYAB ENSG00000109846 NA
5730 prostaglandin D2 synthase The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. PTGDS ENSG00000107317 NA
350 apolipoprotein H Apolipoprotein H has been implicated in a variety of physiologic pathways including lipoprotein metabolism, coagulation, and the production of antiphospholipid autoantibodies. APOH may be a required cofactor for anionic phospholipid binding by the antiphospholipid autoantibodies found in sera of many patients with lupus and primary antiphospholipid syndrome, but it does not seem to be required for the reactivity of antiphospholipid autoantibodies associated with infections. APOH ENSG00000091583 NA
1293 collagen type VI alpha 3 chain This gene encodes the alpha-3 chain, one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The alpha-3 chain of type VI collagen is much larger than the alpha-1 and -2 chains. This difference in size is largely due to an increase in the number of subdomains, similar to von Willebrand Factor type A domains, that are found in the amino terminal globular domain of all the alpha chains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in the type VI collagen genes are associated with Bethlem myopathy, a rare autosomal dominant proximal myopathy with early childhood onset. Mutations in this gene are also a cause of Ullrich congenital muscular dystrophy, also referred to as Ullrich scleroatonic muscular dystrophy, an autosomal recessive congenital myopathy that is more severe than Bethlem myopathy. Multiple transcript variants have been identified, but the full-length nature of only some of these variants has been described. COL6A3 ENSG00000163359 NA
2012 epithelial membrane protein 1 NA EMP1 ENSG00000134531 NA
23413 neuronal calcium sensor 1 This gene is a member of the neuronal calcium sensor gene family, which encode calcium-binding proteins expressed predominantly in neurons. The protein encoded by this gene regulates G protein-coupled receptor phosphorylation in a calcium-dependent manner and can substitute for calmodulin. The protein is associated with secretory granules and modulates synaptic transmission and synaptic plasticity. Multiple transcript variants encoding different isoforms have been found for this gene. NCS1 ENSG00000107130 NA
7169 tropomyosin 2 (beta) This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. TPM2 ENSG00000198467 NA
259 alpha-1-microglobulin/bikunin precursor This gene encodes a complex glycoprotein secreted in plasma. The precursor is proteolytically processed into distinct functioning proteins: alpha-1-microglobulin, which belongs to the superfamily of lipocalin transport proteins and may play a role in the regulation of inflammatory processes, and bikunin, which is a urinary trypsin inhibitor belonging to the superfamily of Kunitz-type protease inhibitors and plays an important role in many physiological and pathological processes. This gene is located on chromosome 9 in a cluster of lipocalin genes. AMBP ENSG00000106927 NA
7134 troponin C1, slow skeletal and cardiac type Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z. TNNC1 ENSG00000114854 NA
3678 integrin subunit alpha 5 The product of this gene belongs to the integrin alpha chain family. Integrins are heterodimeric integral membrane proteins composed of an alpha subunit and a beta subunit that function in cell surface adhesion and signaling. The encoded preproprotein is proteolytically processed to generate light and heavy chains that comprise the alpha 5 subunit. This subunit associates with the beta 1 subunit to form a fibronectin receptor. This integrin may promote tumor invasion, and higher expression of this gene may be correlated with shorter survival time in lung cancer patients. Note that the integrin alpha 5 and integrin alpha V subunits are encoded by distinct genes. ITGA5 ENSG00000161638 NA
3263 hemopexin This gene encodes a plasma glycoprotein that binds heme with high affinity. The encoded protein is an acute phase protein that transports heme from the plasma to the liver and may be involved in protecting cells from oxidative stress. HPX ENSG00000110169 NA
88 actinin alpha 2 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. ACTN2 ENSG00000077522 NA
10627 myosin light chain 12A This gene encodes a nonsarcomeric myosin regulatory light chain. This protein is activated by phosphorylation and regulates smooth muscle and non-muscle cell contraction. This protein may also be involved in DNA damage repair by sequestering the transcriptional regulator apoptosis-antagonizing transcription factor (AATF)/Che-1 which functions as a repressor of p53-driven apoptosis. Alternate splicing results in multiple transcript variants. A pseudogene of this gene is found on chromosome 8. MYL12A ENSG00000101608 NA
1357 carboxypeptidase A1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. CPA1 ENSG00000091704 NA
7273 titin This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. TTN ENSG00000155657 NA
8076 microfibrillar associated protein 5 This gene encodes a 25-kD microfibril-associated glycoprotein which is a component of microfibrils of the extracellular matrix. The encoded protein promotes attachment of cells to microfibrils via alpha-V-beta-3 integrin. Deficiency of this gene in mice results in neutropenia. Alternate splicing results in multiple transcript variants encoding different isoforms. MFAP5 ENSG00000197614 NA
NA NA NA NA ENSG00000272761 TRUE
1264 calponin 1 NA CNN1 ENSG00000130176 NA
3043 hemoglobin subunit beta The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. HBB ENSG00000244734 NA
2022 endoglin This gene encodes a homodimeric transmembrane protein which is a major glycoprotein of the vascular endothelium. This protein is a component of the transforming growth factor beta receptor complex and it binds to the beta1 and beta3 peptides with high affinity. Mutations in this gene cause hereditary hemorrhagic telangiectasia, also known as Osler-Rendu-Weber syndrome 1, an autosomal dominant multisystemic vascular dysplasia. This gene may also be involved in preeclampsia and several types of cancer. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENG ENSG00000106991 NA
4607 myosin binding protein C, cardiac MYBPC3 encodes the cardiac isoform of myosin-binding protein C. Myosin-binding protein C is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. MYBPC3, the cardiac isoform, is expressed exclussively in heart muscle. Regulatory phosphorylation of the cardiac isoform in vivo by cAMP-dependent protein kinase (PKA) upon adrenergic stimulation may be linked to modulation of cardiac contraction. Mutations in MYBPC3 are one cause of familial hypertrophic cardiomyopathy. MYBPC3 ENSG00000134571 NA
6711 spectrin beta, non-erythrocytic 1 Spectrin is an actin crosslinking and molecular scaffold protein that links the plasma membrane to the actin cytoskeleton, and functions in the determination of cell shape, arrangement of transmembrane proteins, and organization of organelles. It is composed of two antiparallel dimers of alpha- and beta- subunits. This gene is one member of a family of beta-spectrin genes. The encoded protein contains an N-terminal actin-binding domain, and 17 spectrin repeats which are involved in dimer formation. Multiple transcript variants encoding different isoforms have been found for this gene. SPTBN1 ENSG00000115306 NA
4015 lysyl oxidase This gene encodes a member of the lysyl oxidase family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate a regulatory propeptide and the mature enzyme. The copper-dependent amine oxidase activity of this enzyme functions in the crosslinking of collagens and elastin, while the propeptide may play a role in tumor suppression. LOX ENSG00000113083 NA
8048 cysteine and glycine rich protein 3 This gene encodes a member of the CSRP family of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this protein is found in a group of proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Mutations in this gene are thought to cause heritable forms of hypertrophic cardiomyopathy (HCM) and dilated cardiomyopathy (DCM) in humans. Alternatively spliced transcript variants with different 5’ UTR, but encoding the same protein, have been found for this gene. CSRP3 ENSG00000129170 NA
1303 collagen type XII alpha 1 chain This gene encodes the alpha chain of type XII collagen, a member of the FACIT (fibril-associated collagens with interrupted triple helices) collagen family. Type XII collagen is a homotrimer found in association with type I collagen, an association that is thought to modify the interactions between collagen I fibrils and the surrounding matrix. Alternatively spliced transcript variants encoding different isoforms have been identified. COL12A1 ENSG00000111799 NA
335 apolipoprotein A1 This gene encodes apolipoprotein A-I, which is the major protein component of high density lipoprotein (HDL) in plasma. The encoded preproprotein is proteolytically processed to generate the mature protein, which promotes cholesterol efflux from tissues to the liver for excretion, and is a cofactor for lecithin cholesterolacyltransferase (LCAT), an enzyme responsible for the formation of most plasma cholesteryl esters. This gene is closely linked with two other apolipoprotein genes on chromosome 11. Defects in this gene are associated with HDL deficiencies, including Tangier disease, and with systemic non-neuropathic amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein. APOA1 ENSG00000118137 NA
11034 destrin, actin depolymerizing factor The product of this gene belongs to the actin-binding proteins ADF family. This family of proteins is responsible for enhancing the turnover rate of actin in vivo. This gene encodes the actin depolymerizing protein that severs actin filaments (F-actin) and binds to actin monomers (G-actin). Two transcript variants encoding distinct isoforms have been identified for this gene. DSTN ENSG00000125868 NA
51559 5’-nucleotidase domain containing 3 NA NT5DC3 ENSG00000111696 NA
57326 PBX homeobox interacting protein 1 The protein encoded by this gene interacts with the PBX1 homeodomain protein, inhibiting its transcriptional activation potential by preventing its binding to DNA. The encoded protein, which is primarily cytosolic but can shuttle to the nucleus, also can interact with estrogen receptors alpha and beta and promote the proliferation of breast cancer, brain tumors, and lung cancer. Several transcript variants encoding different isoforms have been found for this gene. More variants exist, but their full-length natures have yet to be determined. PBXIP1 ENSG00000163346 NA
6175 ribosomal protein lateral stalk subunit P0 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein, which is the functional equivalent of the E. coli L10 ribosomal protein, belongs to the L10P family of ribosomal proteins. It is a neutral phosphoprotein with a C-terminal end that is nearly identical to the C-terminal ends of the acidic ribosomal phosphoproteins P1 and P2. The P0 protein can interact with P1 and P2 to form a pentameric complex consisting of P1 and P2 dimers, and a P0 monomer. The protein is located in the cytoplasm. Transcript variants derived from alternative splicing exist; they encode the same protein. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. RPLP0 ENSG00000089157 NA
308 annexin A5 The protein encoded by this gene belongs to the annexin family of calcium-dependent phospholipid binding proteins some of which have been implicated in membrane-related events along exocytotic and endocytotic pathways. Annexin 5 is a phospholipase A2 and protein kinase C inhibitory protein with calcium channel activity and a potential role in cellular signal transduction, inflammation, growth and differentiation. Annexin 5 has also been described as placental anticoagulant protein I, vascular anticoagulant-alpha, endonexin II, lipocortin V, placental protein 4 and anchorin CII. The gene spans 29 kb containing 13 exons, and encodes a single transcript of approximately 1.6 kb and a protein product with a molecular weight of about 35 kDa. ANXA5 ENSG00000164111 NA
10457 glycoprotein nmb The protein encoded by this gene is a type I transmembrane glycoprotein which shows homology to the pMEL17 precursor, a melanocyte-specific protein. GPNMB shows expression in the lowly metastatic human melanoma cell lines and xenografts but does not show expression in the highly metastatic cell lines. GPNMB may be involved in growth delay and reduction of metastatic potential. Two transcript variants encoding different isoforms have been found for this gene. GPNMB ENSG00000136235 NA
11030 RNA binding protein with multiple splicing This gene encodes a member of the RNA recognition motif family of RNA-binding proteins. The RNA recognition motif is between 80-100 amino acids in length and family members contain one to four copies of the motif. The RNA recognition motif consists of two short stretches of conserved sequence, as well as a few highly conserved hydrophobic residues. The encoded protein has a single, putative RNA recognition motif in its N-terminus. Alternative splicing results in multiple transcript variants encoding different isoforms. RBPMS ENSG00000157110 NA
5644 protease, serine 1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. PRSS1 ENSG00000204983 NA
3912 laminin subunit beta 1 Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. Several isoforms of each chain have been described. Different alpha, beta and gamma chain isomers combine to give rise to different heterotrimeric laminin isoforms which are designated by Arabic numerals in the order of their discovery, i.e. alpha1beta1gamma1 heterotrimer is laminin 1. The biological functions of the different chains and trimer molecules are largely unknown, but some of the chains have been shown to differ with respect to their tissue distribution, presumably reflecting diverse functions in vivo. This gene encodes the beta chain isoform laminin, beta 1. The beta 1 chain has 7 structurally distinct domains which it shares with other beta chain isomers. The C-terminal helical region containing domains I and II are separated by domain alpha, domains III and V contain several EGF-like repeats, and domains IV and VI have a globular conformation. Laminin, beta 1 is expressed in most tissues that produce basement membranes, and is one of the 3 chains constituting laminin 1, the first laminin isolated from Engelbreth-Holm-Swarm (EHS) tumor. A sequence in the beta 1 chain that is involved in cell attachment, chemotaxis, and binding to the laminin receptor was identified and shown to have the capacity to inhibit metastasis. LAMB1 ENSG00000091136 NA
7916 proline rich coiled-coil 2A A cluster of genes, BAT1-BAT5, has been localized in the vicinity of the genes for TNF alpha and TNF beta. These genes are all within the human major histocompatibility complex class III region. This gene has microsatellite repeats which are associated with the age-at-onset of insulin-dependent diabetes mellitus (IDDM) and possibly thought to be involved with the inflammatory process of pancreatic beta-cell destruction during the development of IDDM. This gene is also a candidate gene for the development of rheumatoid arthritis. Two transcript variants encoding the same protein have been found for this gene. PRRC2A ENSG00000204469 NA
4313 matrix metallopeptidase 2 This gene is a member of the matrix metalloproteinase (MMP) gene family, that are zinc-dependent enzymes capable of cleaving components of the extracellular matrix and molecules involved in signal transduction. The protein encoded by this gene is a gelatinase A, type IV collagenase, that contains three fibronectin type II repeats in its catalytic site that allow binding of denatured type IV and V collagen and elastin. Unlike most MMP family members, activation of this protein can occur on the cell membrane. This enzyme can be activated extracellularly by proteases, or, intracellulary by its S-glutathiolation with no requirement for proteolytical removal of the pro-domain. This protein is thought to be involved in multiple pathways including roles in the nervous system, endometrial menstrual breakdown, regulation of vascularization, and metastasis. Mutations in this gene have been associated with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. MMP2 ENSG00000087245 NA
4892 nebulin related anchoring protein NA NRAP ENSG00000197893 NA
336 apolipoprotein A2 This gene encodes apolipoprotein (apo-) A-II, which is the second most abundant protein of the high density lipoprotein particles. The protein is found in plasma as a monomer, homodimer, or heterodimer with apolipoprotein D. Defects in this gene may result in apolipoprotein A-II deficiency or hypercholesterolemia. APOA2 ENSG00000158874 NA
30819 potassium voltage-gated channel interacting protein 2 This gene encodes a member of the family of voltage-gated potassium (Kv) channel-interacting proteins (KCNIPs), which belongs to the recoverin branch of the EF-hand superfamily. Members of the KCNIP family are small calcium binding proteins. They all have EF-hand-like domains, and differ from each other in the N-terminus. They are integral subunit components of native Kv4 channel complexes. They may regulate A-type currents, and hence neuronal excitability, in response to changes in intracellular calcium. Multiple alternatively spliced transcript variants encoding distinct isoforms have been identified from this gene. KCNIP2 ENSG00000120049 NA
3851 keratin 4 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. KRT4 ENSG00000170477 NA
4624 myosin, heavy chain 6, cardiac muscle, alpha Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. MYH6 ENSG00000197616 NA
5066 peptidylglycine alpha-amidating monooxygenase This gene encodes a multifunctional protein. The encoded preproprotein is proteolytically processed to generate the mature enzyme. This enzyme includes two domains with distinct catalytic activities, a peptidylglycine alpha-hydroxylating monooxygenase (PHM) domain and a peptidyl-alpha-hydroxyglycine alpha-amidating lyase (PAL) domain. These catalytic domains work sequentially to catalyze the conversion of neuroendocrine peptides to active alpha-amidated products. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. PAM ENSG00000145730 NA
8831 synaptic Ras GTPase activating protein 1 The protein encoded by this gene is a major component of the postsynaptic density (PSD), a group of proteins found associated with NMDA receptors at synapses. The encoded protein is phosphorylated by calmodulin-dependent protein kinase II and dephosphorylated by NMDA receptor activation. Defects in this gene are a cause of mental retardation autosomal dominant type 5 (MRD5). SYNGAP1 ENSG00000197283 NA
8557 titin-cap Sarcomere assembly is regulated by the muscle protein titin. Titin is a giant elastic protein with kinase activity that extends half the length of a sarcomere. It serves as a scaffold to which myofibrils and other muscle related proteins are attached. This gene encodes a protein found in striated and cardiac muscle that binds to the titin Z1-Z2 domains and is a substrate of titin kinase, interactions thought to be critical to sarcomere assembly. Mutations in this gene are associated with limb-girdle muscular dystrophy type 2G. TCAP ENSG00000173991 NA
1513 cathepsin K The protein encoded by this gene is a lysosomal cysteine proteinase involved in bone remodeling and resorption. This protein, which is a member of the peptidase C1 protein family, is predominantly expressed in osteoclasts. However, the encoded protein is also expressed in a significant fraction of human breast cancers, where it could contribute to tumor invasiveness. Mutations in this gene are the cause of pycnodysostosis, an autosomal recessive disease characterized by osteosclerosis and short stature. CTSK ENSG00000143387 NA
3956 galectin 1 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. This gene product may act as an autocrine negative growth factor that regulates cell proliferation. LGALS1 ENSG00000100097 NA
348 apolipoprotein E The protein encoded by this gene is a major apoprotein of the chylomicron. It binds to a specific liver and peripheral cell receptor, and is essential for the normal catabolism of triglyceride-rich lipoprotein constituents. This gene maps to chromosome 19 in a cluster with the related apolipoprotein C1 and C2 genes. Mutations in this gene result in familial dysbetalipoproteinemia, or type III hyperlipoproteinemia (HLP III), in which increased plasma cholesterol and triglycerides are the consequence of impaired clearance of chylomicron and VLDL remnants. Alternative splicing results in multiple transcript variants. APOE ENSG00000130203 NA
229 aldolase, fructose-bisphosphate B Fructose-1,6-bisphosphate aldolase (EC 4.1.2.13) is a tetrameric glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Vertebrates have 3 aldolase isozymes which are distinguished by their electrophoretic and catalytic properties. Differences indicate that aldolases A, B, and C are distinct proteins, the products of a family of related ‘housekeeping’ genes exhibiting developmentally regulated expression of the different isozymes. The developing embryo produces aldolase A, which is produced in even greater amounts in adult muscle where it can be as much as 5% of total cellular protein. In adult liver, kidney and intestine, aldolase A expression is repressed and aldolase B is produced. In brain and other nervous tissue, aldolase A and C are expressed about equally. There is a high degree of homology between aldolase A and C. Defects in ALDOB cause hereditary fructose intolerance. ALDOB ENSG00000136872 NA
23352 ubiquitin protein ligase E3 component n-recognin 4 The protein encoded by this gene is an E3 ubiquitin-protein ligase that interacts with the retinoblastoma-associated protein in the nucleus and with calcium-bound calmodulin in the cytoplasm. The encoded protein appears to be a cytoskeletal component in the cytoplasm and part of the chromatin scaffold in the nucleus. In addition, this protein is a target of the human papillomavirus type 16 E7 oncoprotein. UBR4 ENSG00000127481 NA
3242 4-hydroxyphenylpyruvate dioxygenase The protein encoded by this gene is an enzyme in the catabolic pathway of tyrosine. The encoded protein catalyzes the conversion of 4-hydroxyphenylpyruvate to homogentisate. Defects in this gene are a cause of tyrosinemia type 3 (TYRO3) and hawkinsinuria (HAWK). Two transcript variants encoding different isoforms have been found for this gene. HPD ENSG00000158104 NA
4053 latent transforming growth factor beta binding protein 2 The protein encoded by this gene belongs to the family of latent transforming growth factor (TGF)-beta binding proteins (LTBP), which are extracellular matrix proteins with multi-domain structure. This protein is the largest member of the LTBP family possessing unique regions and with most similarity to the fibrillins. It has thus been suggested that it may have multiple functions: as a member of the TGF-beta latent complex, as a structural component of microfibrils, and a role in cell adhesion. LTBP2 ENSG00000119681 NA
ENSG00000234961 NA NA RP11-124N14.3 ENSG00000234961 NA
10136 chymotrypsin like elastase family member 3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. CELA3A ENSG00000142789 NA
ENSG00000180139 ACTA2 antisense RNA 1 NA ACTA2-AS1 ENSG00000180139 NA
7139 troponin T2, cardiac type The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. TNNT2 ENSG00000118194 NA
81 actinin alpha 4 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, alpha actinin isoform which is concentrated in the cytoplasm, and thought to be involved in metastatic processes. Mutations in this gene have been associated with focal and segmental glomerulosclerosis. ACTN4 ENSG00000130402 NA
2006 elastin This gene encodes a protein that is one of the two components of elastic fibers. The encoded protein is rich in hydrophobic amino acids such as glycine and proline, which form mobile hydrophobic regions bounded by crosslinks between lysine residues. Deletions and mutations in this gene are associated with supravalvular aortic stenosis (SVAS) and autosomal dominant cutis laxa. Multiple transcript variants encoding different isoforms have been found for this gene. ELN ENSG00000049540 NA
8407 transgelin 2 The protein encoded by this gene is similar to the protein transgelin, which is one of the earliest markers of differentiated smooth muscle. The specific function of this protein has not yet been determined, although it is thought to be a tumor suppressor. Multiple transcript variants encoding different isoforms have been found for this gene. TAGLN2 ENSG00000158710 NA
25802 leiomodin 1 The leiomodin 1 protein has a putative membrane-spanning region and 2 types of tandemly repeated blocks. The transcript is expressed in all tissues tested, with the highest levels in thyroid, eye muscle, skeletal muscle, and ovary. Increased expression of leiomodin 1 may be linked to Graves’ disease and thyroid-associated ophthalmopathy. LMOD1 ENSG00000163431 NA
3983 actin binding LIM protein 1 This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. ABLIM1 ENSG00000099204 NA
59 actin, alpha 2, smooth muscle, aorta The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. ACTA2 ENSG00000107796 NA
58 actin, alpha 1, skeletal muscle The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. ACTA1 ENSG00000143632 NA
7077 TIMP metallopeptidase inhibitor 2 This gene is a member of the TIMP gene family. The proteins encoded by this gene family are natural inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. In addition to an inhibitory role against metalloproteinases, the encoded protein has a unique role among TIMP family members in its ability to directly suppress the proliferation of endothelial cells. As a result, the encoded protein may be critical to the maintenance of tissue homeostasis by suppressing the proliferation of quiescent tissues in response to angiogenic factors, and by inhibiting protease activity in tissues undergoing remodelling of the extracellular matrix. TIMP2 ENSG00000035862 NA
3798 kinesin family member 5A This gene encodes a member of the kinesin family of proteins. Members of this family are part of a multisubunit complex that functions as a microtubule motor in intracellular organelle transport. Mutations in this gene cause autosomal dominant spastic paraplegia 10. KIF5A ENSG00000155980 NA
3164 nuclear receptor subfamily 4 group A member 1 This gene encodes a member of the steroid-thyroid hormone-retinoid receptor superfamily. Expression is induced by phytohemagglutinin in human lymphocytes and by serum stimulation of arrested fibroblasts. The encoded protein acts as a nuclear transcription factor. Translocation of the protein from the nucleus to mitochondria induces apoptosis. Multiple transcript variants encoding different isoforms have been found for this gene. NR4A1 ENSG00000123358 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",9,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 10 Annotations

out <- mygene::queryMany(gene_list[10,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
X_id symbol query name summary
7178 TPT1 ENSG00000133112 tumor protein, translationally-controlled 1 NA
1915 EEF1A1 ENSG00000156508 eukaryotic translation elongation factor 1 alpha 1 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas, and the other isoform (alpha 2) is expressed in brain, heart and skeletal muscle. This isoform is identified as an autoantigen in 66% of patients with Felty syndrome. This gene has been found to have multiple copies on many chromosomes, some of which, if not all, represent different pseudogenes.
ENSG00000237973 MTCO1P12 ENSG00000237973 MT-CO1 pseudogene 12 NA
6122 RPL3 ENSG00000100316 ribosomal protein L3 Ribosomes, the complexes that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L3P family of ribosomal proteins and it is located in the cytoplasm. The protein can bind to the HIV-1 TAR mRNA, and it has been suggested that the protein contributes to tat-mediated transactivation. This gene is co-transcribed with several small nucleolar RNA genes, which are located in several of this gene’s introns. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
23521 RPL13A ENSG00000142541 ribosomal protein L13a Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a member of the L13P family of ribosomal proteins that is a component of the 60S subunit. The encoded protein also plays a role in the repression of inflammatory genes as a component of the IFN-gamma-activated inhibitor of translation (GAIT) complex. This gene is co-transcribed with the small nucleolar RNA genes U32, U33, U34, and U35, which are located in the second, fourth, fifth, and sixth introns, respectively. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed throughout the genome. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene.
3488 IGFBP5 ENSG00000115461 insulin like growth factor binding protein 5 NA
6202 RPS8 ENSG00000142937 ribosomal protein S8 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S8E family of ribosomal proteins. It is located in the cytoplasm. Increased expression of this gene in colorectal tumors and colon polyps compared to matched normal colonic mucosa has been observed. This gene is co-transcribed with the small nucleolar RNA genes U38A, U38B, U39, and U40, which are located in its fourth, fifth, first, and second introns, respectively. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
6194 RPS6 ENSG00000137154 ribosomal protein S6 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a cytoplasmic ribosomal protein that is a component of the 40S subunit. The protein belongs to the S6E family of ribosomal proteins. It is the major substrate of protein kinases in the ribosome, with subsets of five C-terminal serine residues phosphorylated by different protein kinases. Phosphorylation is induced by a wide range of stimuli, including growth factors, tumor-promoting agents, and mitogens. Dephosphorylation occurs at growth arrest. The protein may contribute to the control of cell growth and proliferation through the selective translation of particular classes of mRNA. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
6137 RPL13 ENSG00000167526 ribosomal protein L13 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L13E family of ribosomal proteins. It is located in the cytoplasm. This gene is expressed at significantly higher levels in benign breast lesions than in breast carcinomas. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
6188 RPS3 ENSG00000149273 ribosomal protein S3 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit, where it forms part of the domain where translation is initiated. The protein belongs to the S3P family of ribosomal proteins. Studies of the mouse and rat proteins have demonstrated that the protein has an extraribosomal role as an endonuclease involved in the repair of UV-induced DNA damage. The protein appears to be located in both the cytoplasm and nucleus but not in the nucleolus. Higher levels of expression of this gene in colon adenocarcinomas and adenomatous polyps compared to adjacent normal colonic mucosa have been observed. This gene is co-transcribed with the small nucleolar RNA genes U15A and U15B, which are located in its first and fifth introns, respectively. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene.
6135 RPL11 ENSG00000142676 ribosomal protein L11 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L5P family of ribosomal proteins. It is located in the cytoplasm. The protein probably associates with the 5S rRNA. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
6187 RPS2 ENSG00000140988 ribosomal protein S2 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S5P family of ribosomal proteins. It is located in the cytoplasm. This gene shares sequence similarity with mouse LLRep3. It is co-transcribed with the small nucleolar RNA gene U64, which is located in its third intron. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
6203 RPS9 ENSG00000170889 ribosomal protein S9 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S4P family of ribosomal proteins. It is located in the cytoplasm. Variable expression of this gene in colorectal cancers compared to adjacent normal tissues has been observed, although no correlation between the level of expression and the severity of the disease has been found. As is typical for genes encoding ribosomal proteins, multiple processed pseudogenes derived from this gene are dispersed through the genome.
6130 RPL7A ENSG00000148303 ribosomal protein L7a Cytoplasmic ribosomes, organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L7AE family of ribosomal proteins. It can interact with a subclass of nuclear hormone receptors, including thyroid hormone receptor, and inhibit their ability to transactivate by preventing their binding to their DNA response elements. This gene is included in the surfeit gene cluster, a group of very tightly linked genes that do not share sequence similarity. It is co-transcribed with the U24, U36a, U36b, and U36c small nucleolar RNA genes, which are located in its second, fifth, fourth, and sixth introns, respectively. This gene rearranges with the trk proto-oncogene to form the chimeric oncogene trk-2h, which encodes an oncoprotein consisting of the N terminus of ribosomal protein L7a fused to the receptor tyrosine kinase domain of trk. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
9349 RPL23 ENSG00000125691 ribosomal protein L23 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L14P family of ribosomal proteins. It is located in the cytoplasm. This gene has been referred to as rpL17 because the encoded protein shares amino acid identity with ribosomal protein L17 from Saccharomyces cerevisiae; however, its official symbol is RPL23. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
6160 RPL31 ENSG00000071082 ribosomal protein L31 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L31E family of ribosomal proteins. It is located in the cytoplasm. Higher levels of expression of this gene in familial adenomatous polyps compared to matched normal tissues have been observed. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene.
6181 RPLP2 ENSG00000177600 ribosomal protein lateral stalk subunit P2 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal phosphoprotein that is a component of the 60S subunit. The protein, which is a functional equivalent of the E. coli L7/L12 ribosomal protein, belongs to the L12P family of ribosomal proteins. It plays an important role in the elongation step of protein synthesis. Unlike most ribosomal proteins, which are basic, the encoded protein is acidic. Its C-terminal end is nearly identical to the C-terminal ends of the ribosomal phosphoproteins P0 and P1. The P2 protein can interact with P0 and P1 to form a pentameric complex consisting of P1 and P2 dimers, and a P0 monomer. The protein is located in the cytoplasm. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
6175 RPLP0 ENSG00000089157 ribosomal protein lateral stalk subunit P0 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein, which is the functional equivalent of the E. coli L10 ribosomal protein, belongs to the L10P family of ribosomal proteins. It is a neutral phosphoprotein with a C-terminal end that is nearly identical to the C-terminal ends of the acidic ribosomal phosphoproteins P1 and P2. The P0 protein can interact with P1 and P2 to form a pentameric complex consisting of P1 and P2 dimers, and a P0 monomer. The protein is located in the cytoplasm. Transcript variants derived from alternative splicing exist; they encode the same protein. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
6205 RPS11 ENSG00000142534 ribosomal protein S11 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a member of the S17P family of ribosomal proteins that is a component of the 40S subunit. This gene is co-transcribed with the small nucleolar RNA gene U35B, which is located in the third intron. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed throughout the genome.
728658 RPL13AP5 ENSG00000236552 ribosomal protein L13a pseudogene 5 NA
6143 RPL19 ENSG00000108298 ribosomal protein L19 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L19E family of ribosomal proteins. It is located in the cytoplasm. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
1674 DES ENSG00000175084 desmin This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies.
6158 RPL28 ENSG00000108107 ribosomal protein L28 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L28E family of ribosomal proteins. It is located in the cytoplasm. Variable expression of this gene in colorectal cancers compared to adjacent normal tissues has been observed, although no correlation between the level of expression and the severity of the disease has been found. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Alternative splicing results in multiple transcript variants encoding distinct isoforms.
6206 RPS12 ENSG00000112306 ribosomal protein S12 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S12E family of ribosomal proteins. It is located in the cytoplasm. Increased expression of this gene in colorectal cancers compared to matched normal colonic mucosa has been observed. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
6136 RPL12 ENSG00000197958 ribosomal protein L12 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L11P family of ribosomal proteins. It is located in the cytoplasm. The protein binds directly to the 26S rRNA. This gene is co-transcribed with the U65 snoRNA, which is located in its fourth intron. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
6132 RPL8 ENSG00000161016 ribosomal protein L8 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L2P family of ribosomal proteins. It is located in the cytoplasm. In rat, the protein associates with the 5.8S rRNA, very likely participates in the binding of aminoacyl-tRNA, and is a constituent of the elongation factor 2-binding site at the ribosomal subunit interface. Alternatively spliced transcript variants encoding the same protein exist. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
6222 RPS18 ENSG00000231500 ribosomal protein S18 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S13P family of ribosomal proteins. It is located in the cytoplasm. The gene product of the E. coli ortholog (ribosomal protein S13) is involved in the binding of fMet-tRNA, and thus, in the initiation of translation. This gene is an ortholog of mouse Ke3. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
6229 RPS24 ENSG00000138326 ribosomal protein S24 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S24E family of ribosomal proteins. It is located in the cytoplasm. Multiple transcript variants encoding different isoforms have been found for this gene. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Mutations in this gene result in Diamond-Blackfan anemia.
6224 RPS20 ENSG00000008988 ribosomal protein S20 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S10P family of ribosomal proteins. It is located in the cytoplasm. This gene is co-transcribed with the small nucleolar RNA gene U54, which is located in its second intron. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Two transcript variants encoding different isoforms have been identified for this gene.
10399 RACK1 ENSG00000204628 receptor for activated C kinase 1 NA
6217 RPS16 ENSG00000105193 ribosomal protein S16 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S9P family of ribosomal proteins. It is located in the cytoplasm. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
4629 MYH11 ENSG00000133392 myosin, heavy chain 11, smooth muscle The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified.
6168 RPL37A ENSG00000197756 ribosomal protein L37a Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L37AE family of ribosomal proteins. It is located in the cytoplasm. The protein contains a C4-type zinc finger-like domain. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
6161 RPL32 ENSG00000144713 ribosomal protein L32 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L32E family of ribosomal proteins. It is located in the cytoplasm. Although some studies have mapped this gene to 3q13.3-q21, it is believed to map to 3p25-p24. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Alternatively spliced transcript variants encoding the same protein have been observed for this gene.
ENSG00000273149 RP11-290D2.6 ENSG00000273149 NA NA
6167 RPL37 ENSG00000145592 ribosomal protein L37 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L37E family of ribosomal proteins. It is located in the cytoplasm. The protein contains a C2C2-type zinc finger-like motif. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
6141 RPL18 ENSG00000063177 ribosomal protein L18 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a member of the L18E family of ribosomal proteins that is a component of the 60S subunit. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene.
6208 RPS14 ENSG00000164587 ribosomal protein S14 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S11P family of ribosomal proteins. It is located in the cytoplasm. Transcript variants utilizing alternative transcription initiation sites have been described in the literature. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. In Chinese hamster ovary cells, mutations in this gene can lead to resistance to emetine, a protein synthesis inhibitor. Multiple alternatively spliced transcript variants encoding the same protein have been found for this gene.
29997 GLTSCR2 ENSG00000105373 glioma tumor suppressor candidate region gene 2 NA
ENSG00000244398 RP11-466H18.1 ENSG00000244398 NA NA
ENSG00000232573 RPL3P4 ENSG00000232573 ribosomal protein L3 pseudogene 4 NA
6227 RPS21 ENSG00000171858 ribosomal protein S21 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S21E family of ribosomal proteins. It is located in the cytoplasm. Alternative splice variants that encode different protein isoforms have been described, but their existence has not been verified. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
6233 RPS27A ENSG00000143947 ribosomal protein S27a Ubiquitin, a highly conserved protein that has a major role in targeting cellular proteins for degradation by the 26S proteosome, is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin fused to an unrelated protein. This gene encodes a fusion protein consisting of ubiquitin at the N terminus and ribosomal protein S27a at the C terminus. When expressed in yeast, the protein is post-translationally processed, generating free ubiquitin monomer and ribosomal protein S27a. Ribosomal protein S27a is a component of the 40S subunit of the ribosome and belongs to the S27AE family of ribosomal proteins. It contains C4-type zinc finger domains and is located in the cytoplasm. Pseudogenes derived from this gene are present in the genome. As with ribosomal protein S27a, ribosomal protein L40 is also synthesized as a fusion protein with ubiquitin; similarly, ribosomal protein S30 is synthesized as a fusion protein with the ubiquitin-like protein fubi. Multiple alternatively spliced transcript variants that encode the same proteins have been identified.
4736 RPL10A ENSG00000198755 ribosomal protein L10a Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L1P family of ribosomal proteins. It is located in the cytoplasm. The expression of this gene is downregulated in the thymus by cyclosporin-A (CsA), an immunosuppressive drug. Studies in mice have shown that the expression of the ribosomal protein L10a gene is downregulated in neural precursor cells during development. This gene previously was referred to as NEDD6 (neural precursor cell expressed, developmentally downregulated 6), but it has been renamed RPL10A (ribosomal protein 10a). As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
3487 IGFBP4 ENSG00000141753 insulin like growth factor binding protein 4 This gene is a member of the insulin-like growth factor binding protein (IGFBP) family and encodes a protein with an IGFBP domain and a thyroglobulin type-I domain. The protein binds both insulin-like growth factors (IGFs) I and II and circulates in the plasma in both glycosylated and non-glycosylated forms. Binding of this protein prolongs the half-life of the IGFs and alters their interaction with cell surface receptors.
ENSG00000196205 EEF1A1P5 ENSG00000196205 eukaryotic translation elongation factor 1 alpha 1 pseudogene 5 NA
6164 RPL34 ENSG00000109475 ribosomal protein L34 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L34E family of ribosomal proteins. It is located in the cytoplasm. This gene originally was thought to be located at 17q21, but it has been mapped to 4q. Overexpression of this gene has been observed in some cancer cells. Alternative splicing results in multiple transcript variants, all encoding the same isoform. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
1360 CPB1 ENSG00000153002 carboxypeptidase B1 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma.
4155 MBP ENSG00000197971 myelin basic protein The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes.
ENSG00000229344 MTCO2P12 ENSG00000229344 MT-CO2 pseudogene 12 NA
ENSG00000225630 MTND2P28 ENSG00000225630 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28 NA
ENSG00000213442 RPL18AP3 ENSG00000213442 ribosomal protein L18a pseudogene 3 NA
ENSG00000227097 RPS28P7 ENSG00000227097 ribosomal protein S28 pseudogene 7 NA
3921 RPSA ENSG00000168028 ribosomal protein SA Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Many of the effects of laminin are mediated through interactions with cell surface receptors. These receptors include members of the integrin family, as well as non-integrin laminin-binding proteins. This gene encodes a high-affinity, non-integrin family, laminin receptor 1. This receptor has been variously called 67 kD laminin receptor, 37 kD laminin receptor precursor (37LRP) and p40 ribosome-associated protein. The amino acid sequence of laminin receptor 1 is highly conserved through evolution, suggesting a key biological function. It has been observed that the level of the laminin receptor transcript is higher in colon carcinoma tissue and lung cancer cell line than their normal counterparts. Also, there is a correlation between the upregulation of this polypeptide in cancer cells and their invasive and metastatic phenotype. Multiple copies of this gene exist, however, most of them are pseudogenes thought to have arisen from retropositional events. Two alternatively spliced transcript variants encoding the same protein have been found for this gene.
9045 RPL14 ENSG00000188846 ribosomal protein L14 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L14E family of ribosomal proteins. It contains a basic region-leucine zipper (bZIP)-like domain. The protein is located in the cytoplasm. This gene contains a trinucleotide (GCT) repeat tract whose length is highly polymorphic; these triplet repeats result in a stretch of alanine residues in the encoded protein. Transcript variants utilizing alternative polyA signals and alternative 5’-terminal exons exist but all encode the same protein. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
6157 RPL27A ENSG00000166441 ribosomal protein L27a Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L15P family of ribosomal proteins. It is located in the cytoplasm. Variable expression of this gene in colorectal cancers compared to adjacent normal tissues has been observed, although no correlation between the level of expression and the severity of the disease has been found. As is typical for genes encoding ribosomal proteins, multiple processed pseudogenes derived from this gene are dispersed through the genome.
6228 RPS23 ENSG00000186468 ribosomal protein S23 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S12P family of ribosomal proteins. It is located in the cytoplasm. The protein shares significant amino acid similarity with S. cerevisiae ribosomal protein S28. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
6159 RPL29 ENSG00000162244 ribosomal protein L29 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a cytoplasmic ribosomal protein that is a component of the 60S subunit. The protein belongs to the L29E family of ribosomal proteins. The protein is also a peripheral membrane protein expressed on the cell surface that directly binds heparin. Although this gene was previously reported to map to 3q29-qter, it is believed that it is located at 3p21.3-p21.2. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
7273 TTN ENSG00000155657 titin This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma.
6223 RPS19 ENSG00000105372 ribosomal protein S19 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S19E family of ribosomal proteins. It is located in the cytoplasm. Mutations in this gene cause Diamond-Blackfan anemia (DBA), a constitutional erythroblastopenia characterized by absent or decreased erythroid precursors, in a subset of patients. This suggests a possible extra-ribosomal function for this gene in erythropoietic differentiation and proliferation, in addition to its ribosomal function. Higher expression levels of this gene in some primary colon carcinomas compared to matched normal colon tissues has been observed. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
2495 FTH1 ENSG00000167996 ferritin heavy chain 1 This gene encodes the heavy subunit of ferritin, the major intracellular iron storage protein in prokaryotes and eukaryotes. It is composed of 24 subunits of the heavy and light ferritin chains. Variation in ferritin subunit composition may affect the rates of iron uptake and release in different tissues. A major function of ferritin is the storage of iron in a soluble and nontoxic state. Defects in ferritin proteins are associated with several neurodegenerative diseases. This gene has multiple pseudogenes. Several alternatively spliced transcript variants have been observed, but their biological validity has not been determined.
6156 RPL30 ENSG00000156482 ribosomal protein L30 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L30E family of ribosomal proteins. It is located in the cytoplasm. This gene is co-transcribed with the U72 small nucleolar RNA gene, which is located in its fourth intron. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
3615 IMPDH2 ENSG00000178035 IMP (inosine 5’-monophosphate) dehydrogenase 2 This gene encodes the rate-limiting enzyme in the de novo guanine nucleotide biosynthesis. It is thus involved in maintaining cellular guanine deoxy- and ribonucleotide pools needed for DNA and RNA synthesis. The encoded protein catalyzes the NAD-dependent oxidation of inosine-5’-monophosphate into xanthine-5’-monophosphate, which is then converted into guanosine-5’-monophosphate. This gene is up-regulated in some neoplasms, suggesting it may play a role in malignant transformation.
6193 RPS5 ENSG00000083845 ribosomal protein S5 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S7P family of ribosomal proteins. It is located in the cytoplasm. Variable expression of this gene in colorectal cancers compared to adjacent normal tissues has been observed, although no correlation between the level of expression and the severity of the disease has been found. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
6165 RPL35A ENSG00000182899 ribosomal protein L35a Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L35AE family of ribosomal proteins. It is located in the cytoplasm. The rat protein has been shown to bind to both initiator and elongator tRNAs, and thus, it is located at the P site, or P and A sites, of the ribosome. Although this gene was originally mapped to chromosome 18, it has been established that it is located at 3q29-qter. Alternative splicing results in multiple transcript variants. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
6235 RPS29 ENSG00000213741 ribosomal protein S29 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit and a member of the S14P family of ribosomal proteins. The protein, which contains a C2-C2 zinc finger-like domain that can bind to zinc, can enhance the tumor suppressor activity of Ras-related protein 1A (KREV1). It is located in the cytoplasm. Variable expression of this gene in colorectal cancers compared to adjacent normal tissues has been observed, although no correlation between the level of expression and the severity of the disease has been found. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Alternatively spliced transcript variants encoding different isoforms have been found for this gene.
1357 CPA1 ENSG00000091704 carboxypeptidase A1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer.
1056 CEL ENSG00000170835 carboxyl ester lipase The protein encoded by this gene is a glycoprotein secreted from the pancreas into the digestive tract and from the lactating mammary gland into human milk. The physiological role of this protein is in cholesterol and lipid-soluble vitamin ester hydrolysis and absorption. This encoded protein promotes large chylomicron production in the intestine. Also its presence in plasma suggests its interactions with cholesterol and oxidized lipoproteins to modulate the progression of atherosclerosis. In pancreatic tumoral cells, this encoded protein is thought to be sequestrated within the Golgi compartment and is probably not secreted. This gene contains a variable number of tandem repeat (VNTR) polymorphism in the coding region that may influence the function of the encoded protein.
6125 RPL5 ENSG00000122406 ribosomal protein L5 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L18P family of ribosomal proteins. It is located in the cytoplasm. The protein binds 5S rRNA to form a stable complex called the 5S ribonucleoprotein particle (RNP), which is necessary for the transport of nonribosome-associated cytoplasmic 5S rRNA to the nucleolus for assembly into ribosomes. The protein interacts specifically with the beta subunit of casein kinase II. Variable expression of this gene in colorectal cancers compared to adjacent normal tissues has been observed, although no correlation between the level of expression and the severity of the disease has been found. This gene is co-transcribed with the small nucleolar RNA gene U21, which is located in its fifth intron. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
71 ACTG1 ENSG00000184009 actin gamma 1 Actins are highly conserved proteins that are involved in various types of cell motility, and maintenance of the cytoskeleton. In vertebrates, three main groups of actin isoforms, alpha, beta and gamma have been identified. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton, and as mediators of internal cell motility. Actin, gamma 1, encoded by this gene, is a cytoplasmic actin found in non-muscle cells. Mutations in this gene are associated with DFNA20/26, a subtype of autosomal dominant non-syndromic sensorineural progressive hearing loss. Alternative splicing results in multiple transcript variants.
125144 LRRC75A-AS1 ENSG00000175061 LRRC75A antisense RNA 1 NA
6209 RPS15 ENSG00000115268 ribosomal protein S15 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S19P family of ribosomal proteins. It is located in the cytoplasm. This gene has been found to be activated in various tumors, such as insulinomas, esophageal cancers, and colon cancers. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Alternative splicing results in multiple transcript variants.
1933 EEF1B2 ENSG00000114942 eukaryotic translation elongation factor 1 beta 2 This gene encodes a translation elongation factor. The protein is a guanine nucleotide exchange factor involved in the transfer of aminoacylated tRNAs to the ribosome. Alternative splicing results in three transcript variants which differ only in the 5’ UTR.
ENSG00000242071 RPL7AP6 ENSG00000242071 ribosomal protein L7a pseudogene 6 NA
5644 PRSS1 ENSG00000204983 protease, serine 1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7.
2813 GP2 ENSG00000169347 glycoprotein 2 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants.
ENSG00000240342 RPS2P5 ENSG00000240342 ribosomal protein S2 pseudogene 5 NA
6169 RPL38 ENSG00000172809 ribosomal protein L38 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L38E family of ribosomal proteins. It is located in the cytoplasm. Alternative splice variants have been identified, both encoding the same protein. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome, including one located in the promoter region of the type 1 angiotensin II receptor gene.
11224 RPL35 ENSG00000136942 ribosomal protein L35 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L29P family of ribosomal proteins. It is located in the cytoplasm. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
302 ANXA2 ENSG00000182718 annexin A2 This gene encodes a member of the annexin family. Members of this calcium-dependent phospholipid-binding protein family play a role in the regulation of cellular growth and in signal transduction pathways. This protein functions as an autocrine factor which heightens osteoclast formation and bone resorption. This gene has three pseudogenes located on chromosomes 4, 9 and 10, respectively. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene.
84525 HOPX ENSG00000171476 HOP homeobox The protein encoded by this gene is a homeodomain protein that lacks certain conserved residues required for DNA binding. It was reported that choriocarcinoma cell lines and tissues failed to express this gene, which suggested the possible involvement of this gene in malignant conversion of placental trophoblasts. Studies in mice suggest that this protein may interact with serum response factor (SRF) and modulate SRF-dependent cardiac-specific gene expression and cardiac development. Multiple alternatively spliced transcript variants have been identified for this gene.
4878 NPPA ENSG00000175206 natriuretic peptide A The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1.
4192 MDK ENSG00000110492 midkine (neurite growth-promoting factor 2) This gene encodes a member of a small family of secreted growth factors that binds heparin and responds to retinoic acid. The encoded protein promotes cell growth, migration, and angiogenesis, in particular during tumorigenesis. This gene has been targeted as a therapeutic for a variety of different disorders. Alternatively spliced transcript variants encoding multiple isoforms have been observed.
6171 RPL41 ENSG00000229117 ribosomal protein L41 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein, which shares sequence similarity with the yeast ribosomal protein YL41, belongs to the L41E family of ribosomal proteins. It is located in the cytoplasm. The protein can interact with the beta subunit of protein kinase CKII and can stimulate the phosphorylation of DNA topoisomerase II-alpha by CKII. Two alternative splice variants have been identified, both encoding the same protein. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000137970 RPL7P9 ENSG00000137970 ribosomal protein L7 pseudogene 9 NA
6155 RPL27 ENSG00000131469 ribosomal protein L27 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L27E family of ribosomal proteins. It is located in the cytoplasm. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
1277 COL1A1 ENSG00000108821 collagen type I alpha 1 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene.
291 SLC25A4 ENSG00000151729 solute carrier family 25 member 4 This gene is a member of the mitochondrial carrier subfamily of solute carrier protein genes. The product of this gene functions as a gated pore that translocates ADP from the cytoplasm into the mitochondrial matrix and ATP from the mitochondrial matrix into the cytoplasm. The protein forms a homodimer embedded in the inner mitochondria membrane. Mutations in this gene have been shown to result in autosomal dominant progressive external opthalmoplegia and familial hypertrophic cardiomyopathy.
1938 EEF2 ENSG00000167658 eukaryotic translation elongation factor 2 This gene encodes a member of the GTP-binding translation elongation factor family. This protein is an essential factor for protein synthesis. It promotes the GTP-dependent translocation of the nascent protein chain from the A-site to the P-site of the ribosome. This protein is completely inactivated by EF-2 kinase phosporylation.
10136 CELA3A ENSG00000142789 chymotrypsin like elastase family member 3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1.
ENSG00000213553 RPLP0P6 ENSG00000213553 ribosomal protein, large, P0 pseudogene 6 NA
6201 RPS7 ENSG00000171863 ribosomal protein S7 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S7E family of ribosomal proteins. It is located in the cytoplasm. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000266844 RP11-862L9.3 ENSG00000266844 NA NA
ENSG00000227081 RP11-543P15.1 ENSG00000227081 NA NA
ENSG00000230202 RP11-632C17__A.1 ENSG00000230202 NA NA
6154 RPL26 ENSG00000161970 ribosomal protein L26 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L24P family of ribosomal proteins. It is located in the cytoplasm. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Mutations in this gene result in Diamond-Blackfan anemia. Alternative splicing results in multiple transcript variants.
6142 RPL18A ENSG00000105640 ribosomal protein L18a Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a member of the L18AE family of ribosomal proteins that is a component of the 60S subunit. The encoded protein may play a role in viral replication by interacting with the hepatitis C virus internal ribosome entry site (IRES). This gene is co-transcribed with the U68 snoRNA, located within the third intron. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed throughout the genome.
6230 RPS25 ENSG00000118181 ribosomal protein S25 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S25E family of ribosomal proteins. It is located in the cytoplasm. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome.
ENSG00000234851 RPL23AP42 ENSG00000234851 ribosomal protein L23a pseudogene 42 NA
ENSG00000234797 RPS3AP6 ENSG00000234797 ribosomal protein S3A pseudogene 6 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",10,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 11 Annotations

out <- mygene::queryMany(gene_list[11,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query symbol X_id summary name notfound
ENSG00000163017 ACTG2 72 Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. actin, gamma 2, smooth muscle, enteric NA
ENSG00000106624 AEBP1 165 This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. AE binding protein 1 NA
ENSG00000026025 VIM 7431 This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. vimentin NA
ENSG00000172867 KRT2 3849 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 2 NA
ENSG00000133392 MYH11 4629 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. myosin, heavy chain 11, smooth muscle NA
ENSG00000075624 ACTB 60 This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. actin, beta NA
ENSG00000042832 TG 7038 Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. thyroglobulin NA
ENSG00000117289 NA NA NA NA TRUE
ENSG00000096696 DSP 1832 This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants. desmoplakin NA
ENSG00000225630 MTND2P28 ENSG00000225630 NA mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28 NA
ENSG00000169710 FASN 2194 The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. fatty acid synthase NA
ENSG00000112096 LOC100129518 100129518 NA uncharacterized LOC100129518 NA
ENSG00000112096 SOD2 6648 This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 1. superoxide dismutase 2, mitochondrial NA
ENSG00000130176 CNN1 1264 NA calponin 1 NA
ENSG00000186395 KRT10 3858 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. keratin 10 NA
ENSG00000155657 TTN 7273 This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. titin NA
ENSG00000143847 PPFIA4 8497 PPFIA4, or liprin-alpha-4, belongs to the liprin-alpha gene family. See liprin-alpha-1 (LIP1, or PPFIA1; MIM 611054) for background on liprins. PTPRF interacting protein alpha 4 NA
ENSG00000269936 RP11-394O4.5 ENSG00000269936 NA NA NA
ENSG00000162896 PIGR 5284 This gene is a member of the immunoglobulin superfamily. The encoded poly-Ig receptor binds polymeric immunoglobulin molecules at the basolateral surface of epithelial cells; the complex is then transported across the cell to be secreted at the apical surface. A significant association was found between immunoglobulin A nephropathy and several SNPs in this gene. polymeric immunoglobulin receptor NA
ENSG00000099194 SCD 6319 This gene encodes an enzyme involved in fatty acid biosynthesis, primarily the synthesis of oleic acid. The protein belongs to the fatty acid desaturase family and is an integral membrane protein located in the endoplasmic reticulum. Transcripts of approximately 3.9 and 5.2 kb, differing only by alternative polyadenlyation signals, have been detected. A gene encoding a similar enzyme is located on chromosome 4 and a pseudogene of this gene is located on chromosome 17. stearoyl-CoA desaturase NA
ENSG00000100345 MYH9 4627 This gene encodes a conventional non-muscle myosin; this protein should not be confused with the unconventional myosin-9a or 9b (MYO9A or MYO9B). The encoded protein is a myosin IIA heavy chain that contains an IQ domain and a myosin head-like domain which is involved in several important functions, including cytokinesis, cell motility and maintenance of cell shape. Defects in this gene have been associated with non-syndromic sensorineural deafness autosomal dominant type 17, Epstein syndrome, Alport syndrome with macrothrombocytopenia, Sebastian syndrome, Fechtner syndrome and macrothrombocytopenia with progressive sensorineural deafness. myosin, heavy chain 9, non-muscle NA
ENSG00000159251 ACTC1 70 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). actin, alpha, cardiac muscle 1 NA
ENSG00000196091 MYBPC1 4604 This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. myosin binding protein C, slow type NA
ENSG00000049323 LTBP1 4052 The protein encoded by this gene belongs to the family of latent TGF-beta binding proteins (LTBPs). The secretion and activation of TGF-betas is regulated by their association with latency-associated proteins and with latent TGF-beta binding proteins. The product of this gene targets latent complexes of transforming growth factor beta to the extracellular matrix, where the latent cytokine is subsequently activated by several different mechanisms. Alternatively spliced transcript variants encoding different isoforms have been identified. latent transforming growth factor beta binding protein 1 NA
ENSG00000185303 SFTPA2 729238 This gene is one of several genes encoding pulmonary-surfactant associated proteins (SFTPA) located on chromosome 10. Mutations in this gene and a highly similar gene located nearby, which affect the highly conserved carbohydrate recognition domain, are associated with idiopathic pulmonary fibrosis. The current version of the assembly displays only a single centromeric SFTPA gene pair rather than the two gene pairs shown in the previous assembly which were thought to have resulted from a duplication. surfactant protein A2 NA
ENSG00000109472 CPE 1363 This gene encodes a member of the M14 family of metallocarboxypeptidases. The encoded preproprotein is proteolytically processed to generate the mature peptidase. This peripheral membrane protein cleaves C-terminal amino acid residues and is involved in the biosynthesis of peptide hormones and neurotransmitters, including insulin. This protein may also function independently of its peptidase activity, as a neurotrophic factor that promotes neuronal survival, and as a sorting receptor that binds to regulated secretory pathway proteins, including prohormones. Mutations in this gene are implicated in type 2 diabetes. carboxypeptidase E NA
ENSG00000188536 HBA2 3040 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. hemoglobin subunit alpha 2 NA
ENSG00000108515 ENO3 2027 This gene encodes one of the three enolase isoenzymes found in mammals. This isoenzyme is found in skeletal muscle cells in the adult where it may play a role in muscle development and regeneration. A switch from alpha enolase to beta enolase occurs in muscle tissue during development in rodents. Mutations in this gene have be associated glycogen storage disease. Alternatively spliced transcript variants encoding different isoforms have been described. enolase 3 NA
ENSG00000168484 SFTPC 6440 This gene encodes the pulmonary-associated surfactant protein C (SPC), an extremely hydrophobic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 2, also called pulmonary alveolar proteinosis due to surfactant protein C deficiency, and are associated with interstitial lung disease in older infants, children, and adults. Alternatively spliced transcript variants encoding different protein isoforms have been identified. surfactant protein C NA
ENSG00000189058 APOD 347 This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. apolipoprotein D NA
ENSG00000148677 ANKRD1 27063 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. ankyrin repeat domain 1 NA
ENSG00000197616 MYH6 4624 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. myosin, heavy chain 6, cardiac muscle, alpha NA
ENSG00000122852 SFTPA1 653509 This gene encodes a lung surfactant protein that is a member of a subfamily of C-type lectins called collectins. The encoded protein binds specific carbohydrate moieties found on lipids and on the surface of microorganisms. This protein plays an essential role in surfactant homeostasis and in the defense against respiratory pathogens. Mutations in this gene are associated with idiopathic pulmonary fibrosis. Alternate splicing results in multiple transcript variants. surfactant protein A1 NA
ENSG00000159176 CSRP1 1465 This gene encodes a member of the cysteine-rich protein (CSRP) family. This gene family includes a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this gene product occurs in proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Alternatively spliced transcript variants have been described. cysteine and glycine rich protein 1 NA
ENSG00000170835 CEL 1056 The protein encoded by this gene is a glycoprotein secreted from the pancreas into the digestive tract and from the lactating mammary gland into human milk. The physiological role of this protein is in cholesterol and lipid-soluble vitamin ester hydrolysis and absorption. This encoded protein promotes large chylomicron production in the intestine. Also its presence in plasma suggests its interactions with cholesterol and oxidized lipoproteins to modulate the progression of atherosclerosis. In pancreatic tumoral cells, this encoded protein is thought to be sequestrated within the Golgi compartment and is probably not secreted. This gene contains a variable number of tandem repeat (VNTR) polymorphism in the coding region that may influence the function of the encoded protein. carboxyl ester lipase NA
ENSG00000163453 IGFBP7 3490 This gene encodes a member of the insulin-like growth factor (IGF)-binding protein (IGFBP) family. IGFBPs bind IGFs with high affinity, and regulate IGF availability in body fluids and tissues and modulate IGF binding to its receptors. This protein binds IGF-I and IGF-II with relatively low affinity, and belongs to a subfamily of low-affinity IGFBPs. It also stimulates prostacyclin production and cell adhesion. Alternatively spliced transcript variants encoding different isoforms have been described for this gene, and one variant has been associated with retinal arterial macroaneurysm (PMID:21835307). insulin like growth factor binding protein 7 NA
ENSG00000196616 ADH1B 125 The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. alcohol dehydrogenase 1B (class I), beta polypeptide NA
ENSG00000182871 COL18A1 80781 This gene encodes the alpha chain of type XVIII collagen. This collagen is one of the multiplexins, extracellular matrix proteins that contain multiple triple-helix domains (collagenous domains) interrupted by non-collagenous domains. A long isoform of the protein has an N-terminal domain that is homologous to the extracellular part of frizzled receptors. Proteolytic processing at several endogenous cleavage sites in the C-terminal domain results in production of endostatin, a potent antiangiogenic protein that is able to inhibit angiogenesis and tumor growth. Mutations in this gene are associated with Knobloch syndrome. The main features of this syndrome involve retinal abnormalities, so type XVIII collagen may play an important role in retinal structure and in neural tube closure. Alternative splicing results in multiple transcript variants. collagen type XVIII alpha 1 chain NA
ENSG00000204388 HSPA1B 3304 This intronless gene encodes a 70kDa heat shock protein which is a member of the heat shock protein 70 family. In conjuction with other heat shock proteins, this protein stabilizes existing proteins against aggregation and mediates the folding of newly translated proteins in the cytosol and in organelles. It is also involved in the ubiquitin-proteasome pathway through interaction with the AU-rich element RNA-binding protein 1. The gene is located in the major histocompatibility complex class III region, in a cluster with two closely related genes which encode similar proteins. heat shock protein family A (Hsp70) member 1B NA
ENSG00000109061 MYH1 4619 Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development. myosin, heavy chain 1, skeletal muscle, adult NA
ENSG00000167658 EEF2 1938 This gene encodes a member of the GTP-binding translation elongation factor family. This protein is an essential factor for protein synthesis. It promotes the GTP-dependent translocation of the nascent protein chain from the A-site to the P-site of the ribosome. This protein is completely inactivated by EF-2 kinase phosporylation. eukaryotic translation elongation factor 2 NA
ENSG00000149591 TAGLN 6876 The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. transgelin NA
ENSG00000185650 ZFP36L1 677 This gene is a member of the TIS11 family of early response genes, which are induced by various agonists such as the phorbol ester TPA and the polypeptide mitogen EGF. This gene is well conserved across species and has a promoter that contains motifs seen in other early-response genes. The encoded protein contains a distinguishing putative zinc finger domain with a repeating cys-his motif. This putative nuclear transcription factor most likely functions in regulating the response to growth factors. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ZFP36 ring finger protein-like 1 NA
ENSG00000171401 KRT13 3860 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. keratin 13 NA
ENSG00000237973 MTCO1P12 ENSG00000237973 NA MT-CO1 pseudogene 12 NA
ENSG00000065534 MYLK 4638 This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. myosin light chain kinase NA
ENSG00000182253 SYNM 23336 The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene. synemin NA
ENSG00000166819 PLIN1 5346 The protein encoded by this gene coats lipid storage droplets in adipocytes, thereby protecting them until they can be broken down by hormone-sensitive lipase. The encoded protein is the major cAMP-dependent protein kinase substrate in adipocytes and, when unphosphorylated, may play a role in the inhibition of lipolysis. Alternatively spliced transcript variants varying in the 5’ UTR, but encoding the same protein, have been found for this gene. perilipin 1 NA
ENSG00000203782 LOR 4014 This gene encodes loricrin, a major protein component of the cornified cell envelope found in terminally differentiated epidermal cells. Mutations in this gene are associated with Vohwinkel’s syndrome and progressive symmetric erythrokeratoderma, both inherited skin diseases. loricrin NA
ENSG00000131095 GFAP 2670 This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. glial fibrillary acidic protein NA
ENSG00000211445 GPX3 2878 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. glutathione peroxidase 3 NA
ENSG00000211896 IGHG1 ENSG00000211896 NA immunoglobulin heavy constant gamma 1 (G1m marker) NA
ENSG00000099204 ABLIM1 3983 This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. actin binding LIM protein 1 NA
ENSG00000130203 APOE 348 The protein encoded by this gene is a major apoprotein of the chylomicron. It binds to a specific liver and peripheral cell receptor, and is essential for the normal catabolism of triglyceride-rich lipoprotein constituents. This gene maps to chromosome 19 in a cluster with the related apolipoprotein C1 and C2 genes. Mutations in this gene result in familial dysbetalipoproteinemia, or type III hyperlipoproteinemia (HLP III), in which increased plasma cholesterol and triglycerides are the consequence of impaired clearance of chylomicron and VLDL remnants. Alternative splicing results in multiple transcript variants. apolipoprotein E NA
ENSG00000206172 HBA1 3039 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. hemoglobin subunit alpha 1 NA
ENSG00000158887 MPZ 4359 This gene is specifically expressed in Schwann cells of the peripheral nervous system and encodes a type I transmembrane glycoprotein that is a major structural protein of the peripheral myelin sheath. The encoded protein contains a large hydrophobic extracellular domain and a smaller basic intracellular domain, which are essential for the formation and stabilization of the multilamellar structure of the compact myelin. Mutations in this gene are associated with autosomal dominant form of Charcot-Marie-Tooth disease type 1 (CMT1B) and other polyneuropathies, such as Dejerine-Sottas syndrome (DSS) and congenital hypomyelinating neuropathy (CHN). A recent study showed that two isoforms are produced from the same mRNA by use of alternative in-frame translation termination codons via a stop codon readthrough mechanism. myelin protein zero NA
ENSG00000142156 COL6A1 1291 The collagens are a superfamily of proteins that play a role in maintaining the integrity of various tissues. Collagens are extracellular matrix proteins and have a triple-helical domain as their common structural element. Collagen VI is a major structural component of microfibrils. The basic structural unit of collagen VI is a heterotrimer of the alpha1(VI), alpha2(VI), and alpha3(VI) chains. The alpha2(VI) and alpha3(VI) chains are encoded by the COL6A2 and COL6A3 genes, respectively. The protein encoded by this gene is the alpha 1 subunit of type VI collagen (alpha1(VI) chain). Mutations in the genes that code for the collagen VI subunits result in the autosomal dominant disorder, Bethlem myopathy. collagen type VI alpha 1 NA
ENSG00000167588 GPD1 2819 This gene encodes a member of the NAD-dependent glycerol-3-phosphate dehydrogenase family. The encoded protein plays a critical role in carbohydrate and lipid metabolism by catalyzing the reversible conversion of dihydroxyacetone phosphate (DHAP) and reduced nicotine adenine dinucleotide (NADH) to glycerol-3-phosphate (G3P) and NAD+. The encoded cytosolic protein and mitochondrial glycerol-3-phosphate dehydrogenase also form a glycerol phosphate shuttle that facilitates the transfer of reducing equivalents from the cytosol to mitochondria. Mutations in this gene are a cause of transient infantile hypertriglyceridemia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. glycerol-3-phosphate dehydrogenase 1 NA
ENSG00000197249 SERPINA1 5265 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. serpin family A member 1 NA
ENSG00000112139 MDGA1 266727 NA MAM domain containing glycosylphosphatidylinositol anchor 1 NA
ENSG00000118985 ELL2 22936 NA elongation factor for RNA polymerase II 2 NA
ENSG00000118194 TNNT2 7139 The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. troponin T2, cardiac type NA
ENSG00000129521 EGLN3 112399 NA egl-9 family hypoxia inducible factor 3 NA
ENSG00000011105 TSPAN9 10867 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. Alternatively spliced transcripts encoding the same protein have been identified. tetraspanin 9 NA
ENSG00000158516 CPA2 1358 Three different forms of human pancreatic procarboxypeptidase A have been isolated. The encoded protein represents the A2 form, which is a monomeric protein with different biochemical properties from the A1 and A3 forms. The A2 form of pancreatic procarboxypeptidase acts on aromatic C-terminal residues and is a secreted protein. carboxypeptidase A2 NA
ENSG00000169604 ANTXR1 84168 This gene encodes a type I transmembrane protein and is a tumor-specific endothelial marker that has been implicated in colorectal cancer. The encoded protein has been shown to also be a docking protein or receptor for Bacillus anthracis toxin, the causative agent of the disease, anthrax. The binding of the protective antigen (PA) component, of the tripartite anthrax toxin, to this receptor protein mediates delivery of toxin components to the cytosol of cells. Once inside the cell, the other two components of anthrax toxin, edema factor (EF) and lethal factor (LF) disrupt normal cellular processes. Three alternatively spliced variants that encode different protein isoforms have been described. anthrax toxin receptor 1 NA
ENSG00000198467 TPM2 7169 This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. tropomyosin 2 (beta) NA
ENSG00000163346 PBXIP1 57326 The protein encoded by this gene interacts with the PBX1 homeodomain protein, inhibiting its transcriptional activation potential by preventing its binding to DNA. The encoded protein, which is primarily cytosolic but can shuttle to the nucleus, also can interact with estrogen receptors alpha and beta and promote the proliferation of breast cancer, brain tumors, and lung cancer. Several transcript variants encoding different isoforms have been found for this gene. More variants exist, but their full-length natures have yet to be determined. PBX homeobox interacting protein 1 NA
ENSG00000211890 IGHA2 ENSG00000211890 NA immunoglobulin heavy constant alpha 2 (A2m marker) NA
ENSG00000118257 NRP2 8828 This gene encodes a member of the neuropilin family of receptor proteins. The encoded transmembrane protein binds to SEMA3C protein {sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3C} and SEMA3F protein {sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3F}, and interacts with vascular endothelial growth factor (VEGF). This protein may play a role in cardiovascular development, axon guidance, and tumorigenesis. Multiple transcript variants encoding distinct isoforms have been identified for this gene. neuropilin 2 NA
ENSG00000125618 PAX8 7849 This gene encodes a member of the paired box (PAX) family of transcription factors. Members of this gene family typically encode proteins that contain a paired box domain, an octapeptide, and a paired-type homeodomain. This nuclear protein is involved in thyroid follicular cell development and expression of thyroid-specific genes. Mutations in this gene have been associated with thyroid dysgenesis, thyroid follicular carcinomas and atypical follicular thyroid adenomas. Alternatively spliced transcript variants encoding different isoforms have been described. paired box 8 NA
ENSG00000225972 MTND1P23 ENSG00000225972 NA mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 1 pseudogene 23 NA
ENSG00000109846 CRYAB 1410 Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. crystallin alpha B NA
ENSG00000175206 NPPA 4878 The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. natriuretic peptide A NA
ENSG00000243955 GSTA1 2938 This gene encodes a member of a family of enzymes that function to add glutathione to target electrophilic compounds, including carcinogens, therapeutic drugs, environmental toxins, and products of oxidative stress. This action is an important step in detoxification of these compounds. This subfamily of enzymes has a particular role in protecting cells from reactive oxygen species and the products of peroxidation. Polymorphisms in this gene influence the ability of individuals to metabolize different drugs. This gene is located in a cluster of similar genes and pseudogenes on chromosome 6. Alternative splicing results in multiple transcript variants. glutathione S-transferase alpha 1 NA
ENSG00000156113 KCNMA1 3778 MaxiK channels are large conductance, voltage and calcium-sensitive potassium channels which are fundamental to the control of smooth muscle tone and neuronal excitability. MaxiK channels can be formed by 2 subunits: the pore-forming alpha subunit, which is the product of this gene, and the modulatory beta subunit. Intracellular calcium regulates the physical association between the alpha and beta subunits. Alternatively spliced transcript variants encoding different isoforms have been identified. potassium calcium-activated channel subfamily M alpha 1 NA
ENSG00000035862 TIMP2 7077 This gene is a member of the TIMP gene family. The proteins encoded by this gene family are natural inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. In addition to an inhibitory role against metalloproteinases, the encoded protein has a unique role among TIMP family members in its ability to directly suppress the proliferation of endothelial cells. As a result, the encoded protein may be critical to the maintenance of tissue homeostasis by suppressing the proliferation of quiescent tissues in response to angiogenic factors, and by inhibiting protease activity in tissues undergoing remodelling of the extracellular matrix. TIMP metallopeptidase inhibitor 2 NA
ENSG00000183091 NEB 4703 This gene encodes nebulin, a giant protein component of the cytoskeletal matrix that coexists with the thick and thin filaments within the sarcomeres of skeletal muscle. In most vertebrates, nebulin accounts for 3 to 4% of the total myofibrillar protein. The encoded protein contains approximately 30-amino acid long modules that can be classified into 7 types and other repeated modules. Protein isoform sizes vary from 600 to 800 kD due to alternative splicing that is tissue-, species-,and developmental stage-specific. Of the 183 exons in the nebulin gene, at least 43 are alternatively spliced, although exons 143 and 144 are not found in the same transcript. Of the several thousand transcript variants predicted for nebulin, the RefSeq Project has decided to create three representative RefSeq records. Mutations in this gene are associated with recessive nemaline myopathy. nebulin NA
ENSG00000112378 PERP 64065 NA PERP, TP53 apoptosis effector NA
ENSG00000198624 CCDC69 26112 NA coiled-coil domain containing 69 NA
ENSG00000234961 RP11-124N14.3 ENSG00000234961 NA NA NA
ENSG00000142173 COL6A2 1292 This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. collagen type VI alpha 2 NA
ENSG00000091490 SEL1L3 23231 NA SEL1L family member 3 NA
ENSG00000163220 S100A9 6280 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. S100 calcium binding protein A9 NA
ENSG00000263335 AF001548.5 ENSG00000263335 NA NA NA
ENSG00000187288 CIDEC 63924 This gene encodes a member of the cell death-inducing DNA fragmentation factor-like effector family. Members of this family play important roles in apoptosis. The encoded protein promotes lipid droplet formation in adipocytes and may mediate adipocyte apoptosis. This gene is regulated by insulin and its expression is positively correlated with insulin sensitivity. Mutations in this gene may contribute to insulin resistant diabetes. A pseudogene of this gene is located on the short arm of chromosome 3. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. cell death inducing DFFA like effector c NA
ENSG00000133065 SLC41A1 254428 NA solute carrier family 41 member 1 NA
ENSG00000147465 STAR 6770 The protein encoded by this gene plays a key role in the acute regulation of steroid hormone synthesis by enhancing the conversion of cholesterol into pregnenolone. This protein permits the cleavage of cholesterol into pregnenolone by mediating the transport of cholesterol from the outer mitochondrial membrane to the inner mitochondrial membrane. Mutations in this gene are a cause of congenital lipoid adrenal hyperplasia (CLAH), also called lipoid CAH. A pseudogene of this gene is located on chromosome 13. steroidogenic acute regulatory protein NA
ENSG00000008394 MGST1 4257 The MAPEG (Membrane Associated Proteins in Eicosanoid and Glutathione metabolism) family consists of six human proteins, two of which are involved in the production of leukotrienes and prostaglandin E, important mediators of inflammation. Other family members, demonstrating glutathione S-transferase and peroxidase activities, are involved in cellular defense against toxic, carcinogenic, and pharmacologically active electrophilic compounds. This gene encodes a protein that catalyzes the conjugation of glutathione to electrophiles and the reduction of lipid hydroperoxides. This protein is localized to the endoplasmic reticulum and outer mitochondrial membrane where it is thought to protect these membranes from oxidative stress. Several transcript variants, some non-protein coding and some protein coding, have been found for this gene. microsomal glutathione S-transferase 1 NA
ENSG00000109099 PMP22 5376 This gene encodes an integral membrane protein that is a major component of myelin in the peripheral nervous system. Studies suggest two alternately used promoters drive tissue-specific expression. Various mutations of this gene are causes of Charcot-Marie-Tooth disease Type IA, Dejerine-Sottas syndrome, and hereditary neuropathy with liability to pressure palsies. Alternative splicing results in multiple transcript variants. peripheral myelin protein 22 NA
ENSG00000078114 NEBL 10529 This gene encodes a nebulin like protein that is abundantly expressed in cardiac muscle. The encoded protein binds actin and interacts with thin filaments and Z-line associated proteins in striated muscle. This protein may be involved in cardiac myofibril assembly. A shorter isoform of this protein termed LIM nebulette is expressed in non-muscle cells and may function as a component of focal adhesion complexes. Alternate splicing results in multiple transcript variants. nebulette NA
ENSG00000147526 TACC1 6867 This locus may represent a breast cancer candidate gene. It is located close to FGFR1 on a region of chromosome 8 that is amplified in some breast cancers. Three transcript variants encoding different isoforms have been found for this gene. transforming acidic coiled-coil containing protein 1 NA
ENSG00000136999 NOV 4856 The protein encoded by this gene is a small secreted cysteine-rich protein and a member of the CCN family of regulatory proteins. CNN family proteins associate with the extracellular matrix and play an important role in cardiovascular and skeletal development, fibrosis and cancer development. nephroblastoma overexpressed NA
ENSG00000197747 S100A10 6281 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in exocytosis and endocytosis. S100 calcium binding protein A10 NA
ENSG00000179218 CALR 811 Calreticulin is a multifunctional protein that acts as a major Ca(2+)-binding (storage) protein in the lumen of the endoplasmic reticulum. It is also found in the nucleus, suggesting that it may have a role in transcription regulation. Calreticulin binds to the synthetic peptide KLGFFKR, which is almost identical to an amino acid sequence in the DNA-binding domain of the superfamily of nuclear receptors. Calreticulin binds to antibodies in certain sera of systemic lupus and Sjogren patients which contain anti-Ro/SSA antibodies, it is highly conserved among species, and it is located in the endoplasmic and sarcoplasmic reticulum where it may bind calcium. The amino terminus of calreticulin interacts with the DNA-binding domain of the glucocorticoid receptor and prevents the receptor from binding to its specific glucocorticoid response element. Calreticulin can inhibit the binding of androgen receptor to its hormone-responsive DNA element and can inhibit androgen receptor and retinoic acid receptor transcriptional activities in vivo, as well as retinoic acid-induced neuronal differentiation. Thus, calreticulin can act as an important modulator of the regulation of gene transcription by nuclear hormone receptors. Systemic lupus erythematosus is associated with increased autoantibody titers against calreticulin but calreticulin is not a Ro/SS-A antigen. Earlier papers referred to calreticulin as an Ro/SS-A antigen but this was later disproven. Increased autoantibody titer against human calreticulin is found in infants with complete congenital heart block of both the IgG and IgM classes. calreticulin NA
ENSG00000111245 MYL2 4633 Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. myosin light chain 2 NA
ENSG00000256545 NA NA NA NA TRUE
ENSG00000229344 MTCO2P12 ENSG00000229344 NA MT-CO2 pseudogene 12 NA
ENSG00000058668 ATP2B4 493 The protein encoded by this gene belongs to the family of P-type primary ion transport ATPases characterized by the formation of an aspartyl phosphate intermediate during the reaction cycle. These enzymes remove bivalent calcium ions from eukaryotic cells against very large concentration gradients and play a critical role in intracellular calcium homeostasis. The mammalian plasma membrane calcium ATPase isoforms are encoded by at least four separate genes and the diversity of these enzymes is further increased by alternative splicing of transcripts. The expression of different isoforms and splice variants is regulated in a developmental, tissue- and cell type-specific manner, suggesting that these pumps are functionally adapted to the physiological needs of particular cells and tissues. This gene encodes the plasma membrane calcium ATPase isoform 4. Alternatively spliced transcript variants encoding different isoforms have been identified. ATPase plasma membrane Ca2+ transporting 4 NA
ENSG00000180139 ACTA2-AS1 ENSG00000180139 NA ACTA2 antisense RNA 1 NA
ENSG00000135821 GLUL 2752 The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. glutamate-ammonia ligase NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",11,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 12 Annotations

out <- mygene::queryMany(gene_list[12,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary query name X_id symbol notfound
This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. ENSG00000115414 fibronectin 1 2335 FN1 NA
This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. ENSG00000026025 vimentin 7431 VIM NA
This gene encodes a cysteine-rich acidic matrix-associated protein. The encoded protein is required for the collagen in bone to become calcified but is also involved in extracellular matrix synthesis and promotion of changes to cell shape. The gene product has been associated with tumor suppression but has also been correlated with metastasis based on changes to cell shape which can promote tumor cell invasion. Three transcript variants encoding different isoforms have been found for this gene. ENSG00000113140 secreted protein acidic and cysteine rich 6678 SPARC NA
Synaptopodin is an actin-associated protein that may play a role in actin-based cell shape and motility. The name synaptopodin derives from the protein’s associations with postsynaptic densities and dendritic spines and with renal podocytes (Mundel et al., 1997 [PubMed 9314539]). ENSG00000171992 synaptopodin 11346 SYNPO NA
NA ENSG00000229124 VIM antisense RNA 1 100507347 VIM-AS1 NA
This gene encodes a conventional non-muscle myosin; this protein should not be confused with the unconventional myosin-9a or 9b (MYO9A or MYO9B). The encoded protein is a myosin IIA heavy chain that contains an IQ domain and a myosin head-like domain which is involved in several important functions, including cytokinesis, cell motility and maintenance of cell shape. Defects in this gene have been associated with non-syndromic sensorineural deafness autosomal dominant type 17, Epstein syndrome, Alport syndrome with macrothrombocytopenia, Sebastian syndrome, Fechtner syndrome and macrothrombocytopenia with progressive sensorineural deafness. ENSG00000100345 myosin, heavy chain 9, non-muscle 4627 MYH9 NA
NA ENSG00000234961 NA ENSG00000234961 RP11-124N14.3 NA
The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000133392 myosin, heavy chain 11, smooth muscle 4629 MYH11 NA
The protein encoded by this gene belongs to the family of latent transforming growth factor (TGF)-beta binding proteins (LTBP), which are extracellular matrix proteins with multi-domain structure. This protein is the largest member of the LTBP family possessing unique regions and with most similarity to the fibrillins. It has thus been suggested that it may have multiple functions: as a member of the TGF-beta latent complex, as a structural component of microfibrils, and a role in cell adhesion. ENSG00000119681 latent transforming growth factor beta binding protein 2 4053 LTBP2 NA
This gene encodes a preproprotein that is proteolytically processed to form multiple protein products. The major encoded protein product, lactadherin, is a membrane glycoprotein that promotes phagocytosis of apoptotic cells. This protein has also been implicated in wound healing, autoimmune disease, and cancer. Lactadherin can be further processed to form a smaller cleavage product, medin, which comprises the major protein component of aortic medial amyloid (AMA). Alternative splicing results in multiple transcript variants. ENSG00000140545 milk fat globule-EGF factor 8 protein 4240 MFGE8 NA
The protein encoded by this gene is a mitogen that is secreted by vascular endothelial cells. The encoded protein plays a role in chondrocyte proliferation and differentiation, cell adhesion in many cell types, and is related to platelet-derived growth factor. Certain polymorphisms in this gene have been linked with a higher incidence of systemic sclerosis. ENSG00000118523 connective tissue growth factor 1490 CTGF NA
This gene encodes the third discovered human homologue of the Drosophilia melanogaster type I membrane protein notch. In Drosophilia, notch interaction with its cell-bound ligands (delta, serrate) establishes an intercellular signalling pathway that plays a key role in neural development. Homologues of the notch-ligands have also been identified in human, but precise interactions between these ligands and the human notch homologues remains to be determined. Mutations in NOTCH3 have been identified as the underlying cause of cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL). ENSG00000074181 notch 3 4854 NOTCH3 NA
NA ENSG00000112096 uncharacterized LOC100129518 100129518 LOC100129518 NA
This gene is a member of the iron/manganese superoxide dismutase family. It encodes a mitochondrial protein that forms a homotetramer and binds one manganese ion per subunit. This protein binds to the superoxide byproducts of oxidative phosphorylation and converts them to hydrogen peroxide and diatomic oxygen. Mutations in this gene have been associated with idiopathic cardiomyopathy (IDC), premature aging, sporadic motor neuron disease, and cancer. Alternative splicing of this gene results in multiple transcript variants. A related pseudogene has been identified on chromosome 1. ENSG00000112096 superoxide dismutase 2, mitochondrial 6648 SOD2 NA
The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. ENSG00000163220 S100 calcium binding protein A9 6280 S100A9 NA
This gene encodes a protein involved in glycolysis. The encoded protein is a pyruvate kinase that catalyzes the transfer of a phosphoryl group from phosphoenolpyruvate to ADP, generating ATP and pyruvate. This protein has been shown to interact with thyroid hormone and may mediate cellular metabolic effects induced by thyroid hormones. This protein has been found to bind Opa protein, a bacterial outer membrane protein involved in gonococcal adherence to and invasion of human cells, suggesting a role of this protein in bacterial pathogenesis. Several alternatively spliced transcript variants encoding a few distinct isoforms have been reported. ENSG00000067225 pyruvate kinase, muscle 5315 PKM NA
This gene is a member of the TIMP gene family. The proteins encoded by this gene family are natural inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. In addition to an inhibitory role against metalloproteinases, the encoded protein has a unique role among TIMP family members in its ability to directly suppress the proliferation of endothelial cells. As a result, the encoded protein may be critical to the maintenance of tissue homeostasis by suppressing the proliferation of quiescent tissues in response to angiogenic factors, and by inhibiting protease activity in tissues undergoing remodelling of the extracellular matrix. ENSG00000035862 TIMP metallopeptidase inhibitor 2 7077 TIMP2 NA
Plectin is a prominent member of an important family of structurally and in part functionally related proteins, termed plakins or cytolinkers, that are capable of interlinking different elements of the cytoskeleton. Plakins, with their multi-domain structure and enormous size, not only play crucial roles in maintaining cell and tissue integrity and orchestrating dynamic changes in cytoarchitecture and cell shape, but also serve as scaffolding platforms for the assembly, positioning, and regulation of signaling complexes (reviewed in PMID: 9701547, 11854008, and 17499243). Plectin is expressed as several protein isoforms in a wide range of cell types and tissues from a single gene located on chromosome 8 in humans (PMID: 8633055, 8698233). Until 2010, this locus was named plectin 1 (symbol PLEC1 in human; Plec1 in mouse and rat) and the gene product had been referred to as ‘hemidesmosomal protein 1’ or ‘plectin 1, intermediate filament binding 500kDa’. These names were superseded by plectin. The plectin gene locus in mouse on chromosome 15 has been analyzed in detail (PMID: 10556294, 14559777), revealing a genomic exon-intron organization with well over 40 exons spanning over 62 kb and an unusual 5’ transcript complexity of plectin isoforms. Eleven exons (1-1j) have been identified that alternatively splice directly into a common exon 2 which is the first exon to encode plectin’s highly conserved actin binding domain (ABD). Three additional exons (-1, 0a, and 0) splice into an alternative first coding exon (1c), and two additional exons (2alpha and 3alpha) are optionally spliced within the exons encoding the acting binding domain (exons 2-8). Analysis of the human locus has identified eight of the eleven alternative 5’ exons found in mouse and rat (PMID: 14672974); exons 1i, 1j and 1h have not been confirmed in human. Furthermore, isoforms lacking the central rod domain encoded by exon 31 have been detected in mouse (PMID:10556294), rat (PMID: 9177781), and human (PMID: 11441066, 10780662, 20052759). The short alternative amino-terminal sequences encoded by the different first exons direct the targeting of the various isoforms to distinct subcellular locations (PMID: 14559777). As the expression of specific plectin isoforms was found to be dependent on cell type (tissue) and stage of development (PMID: 10556294, 12542521, 17389230) it appears that each cell type (tissue) contains a unique set (proportion and composition) of plectin isoforms, as if custom-made for specific requirements of the particular cells. Concordantly, individual isoforms were found to carry out distinct and specific functions (PMID: 14559777, 12542521, 18541706). In 1996, a number of groups reported that patients suffering from epidermolysis bullosa simplex with muscular dystrophy (EBS-MD) lacked plectin expression in skin and muscle tissues due to defects in the plectin gene (PMID: 8698233, 8941634, 8636409, 8894687, 8696340). Two other subtypes of plectin-related EBS have been described: EBS-pyloric atresia (PA) and EBS-Ogna. For reviews of plectin-related diseases see PMID: 15810881, 19945614. Mutations in the plectin gene related to human diseases should be named based on the position in NM_000445 (variant 1, isoform 1c), unless the mutation is located within one of the other alternative first exons, in which case the position in the respective Reference Sequence should be used. ENSG00000178209 plectin 5339 PLEC NA
The protein encoded by this gene binds transforming growth factor beta (TGFB) as it is secreted and targeted to the extracellular matrix. TGFB is biologically latent after secretion and insertion into the extracellular matrix, and sheds TGFB and other proteins upon activation. Defects in this gene may be a cause of cutis laxa and severe pulmonary, gastrointestinal, and urinary abnormalities. Three transcript variants encoding different isoforms have been found for this gene. ENSG00000090006 latent transforming growth factor beta binding protein 4 8425 LTBP4 NA
This gene encodes a protein that is a member of the dickkopf family. The secreted protein contains two cysteine rich regions and is involved in embryonic development through its interactions with the Wnt signaling pathway. The expression of this gene is decreased in a variety of cancer cell lines and it may function as a tumor suppressor gene. Alternative splicing results in multiple transcript variants encoding the same protein. ENSG00000050165 dickkopf WNT signaling pathway inhibitor 3 27122 DKK3 NA
This gene encodes a member of the insulin-like growth factor (IGF)-binding protein (IGFBP) family. IGFBPs bind IGFs with high affinity, and regulate IGF availability in body fluids and tissues and modulate IGF binding to its receptors. This protein binds IGF-I and IGF-II with relatively low affinity, and belongs to a subfamily of low-affinity IGFBPs. It also stimulates prostacyclin production and cell adhesion. Alternatively spliced transcript variants encoding different isoforms have been described for this gene, and one variant has been associated with retinal arterial macroaneurysm (PMID:21835307). ENSG00000163453 insulin like growth factor binding protein 7 3490 IGFBP7 NA
This gene encodes a gamma-carboxyglutamic acid (Gla)-containing protein thought to be involved in the stimulation of cell proliferation. This gene is frequently overexpressed in many cancers and has been implicated as an adverse prognostic marker. Elevated protein levels are additionally associated with a variety of disease states, including venous thromboembolic disease, systemic lupus erythematosus, chronic renal failure, and preeclampsia. ENSG00000183087 growth arrest specific 6 2621 GAS6 NA
This gene encodes a protein that is one of the two components of elastic fibers. The encoded protein is rich in hydrophobic amino acids such as glycine and proline, which form mobile hydrophobic regions bounded by crosslinks between lysine residues. Deletions and mutations in this gene are associated with supravalvular aortic stenosis (SVAS) and autosomal dominant cutis laxa. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000049540 elastin 2006 ELN NA
The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and as a cytokine. Altered expression of this protein is associated with the disease cystic fibrosis. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000143546 S100 calcium binding protein A8 6279 S100A8 NA
This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. ENSG00000106624 AE binding protein 1 165 AEBP1 NA
The protein encoded by this gene is a secreted, extracellular matrix protein containing an Arg-Gly-Asp (RGD) motif and calcium-binding EGF-like domains. It promotes adhesion of endothelial cells through interaction of integrins and the RGD motif. It is prominently expressed in developing arteries but less so in adult vessels. However, its expression is reinduced in balloon-injured vessels and atherosclerotic lesions, notably in intimal vascular smooth muscle cells and endothelial cells. Therefore, the protein encoded by this gene may play a role in vascular development and remodeling. Defects in this gene are a cause of autosomal dominant cutis laxa, autosomal recessive cutis laxa type I (CL type I), and age-related macular degeneration type 3 (ARMD3). ENSG00000140092 fibulin 5 10516 FBLN5 NA
The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. ENSG00000171401 keratin 13 3860 KRT13 NA
The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. ENSG00000143632 actin, alpha 1, skeletal muscle 58 ACTA1 NA
This gene encodes a putative transcription factor with two LIM zinc-binding domains. The encoded protein may participate in the differentiation of smooth muscle tissue. Alternative splicing results in multiple transcript variants. ENSG00000182809 cysteine rich protein 2 1397 CRIP2 NA
This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. ENSG00000111640 glyceraldehyde-3-phosphate dehydrogenase 2597 GAPDH NA
This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. ENSG00000189058 apolipoprotein D 347 APOD NA
The protein encoded by this gene is a leucine-rich repeat protein present in connective tissue extracellular matrix. This protein functions as a molecule anchoring basement membranes to the underlying connective tissue. This protein has been shown to bind type I collagen to basement membranes and type II collagen to cartilage. It also binds the basement membrane heparan sulfate proteoglycan perlecan. This protein is suggested to be involved in the pathogenesis of Hutchinson-Gilford progeria (HGP), which is reported to lack the binding of collagen in basement membranes and cartilage. Alternatively spliced transcript variants encoding the same protein have been observed. ENSG00000188783 proline and arginine rich end leucine rich repeat protein 5549 PRELP NA
The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in motility, invasion, and tubulin polymerization. Chromosomal rearrangements and altered expression of this gene have been implicated in tumor metastasis. Multiple alternatively spliced variants, encoding the same protein, have been identified. ENSG00000196154 S100 calcium binding protein A4 6275 S100A4 NA
The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in stimulation of Ca2+-dependent insulin release, stimulation of prolactin secretion, and exocytosis. Chromosomal rearrangements and altered expression of this gene have been implicated in melanoma. ENSG00000197956 S100 calcium binding protein A6 6277 S100A6 NA
NA ENSG00000129353 solute carrier family 44 member 2 57153 SLC44A2 NA
This gene encodes a highly conserved preproprotein that is proteolytically processed to generate four main cleavage products including saposins A, B, C, and D. Each domain of the precursor protein is approximately 80 amino acid residues long with nearly identical placement of cysteine residues and glycosylation sites. Saposins A-D localize primarily to the lysosomal compartment where they facilitate the catabolism of glycosphingolipids with short oligosaccharide groups. The precursor protein exists both as a secretory protein and as an integral membrane protein and has neurotrophic activities. Mutations in this gene have been associated with Gaucher disease and metachromatic leukodystrophy. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. ENSG00000197746 prosaposin 5660 PSAP NA
This gene is specifically expressed in Schwann cells of the peripheral nervous system and encodes a type I transmembrane glycoprotein that is a major structural protein of the peripheral myelin sheath. The encoded protein contains a large hydrophobic extracellular domain and a smaller basic intracellular domain, which are essential for the formation and stabilization of the multilamellar structure of the compact myelin. Mutations in this gene are associated with autosomal dominant form of Charcot-Marie-Tooth disease type 1 (CMT1B) and other polyneuropathies, such as Dejerine-Sottas syndrome (DSS) and congenital hypomyelinating neuropathy (CHN). A recent study showed that two isoforms are produced from the same mRNA by use of alternative in-frame translation termination codons via a stop codon readthrough mechanism. ENSG00000158887 myelin protein zero 4359 MPZ NA
This gene encodes the perlecan protein, which consists of a core protein to which three long chains of glycosaminoglycans (heparan sulfate or chondroitin sulfate) are attached. The perlecan protein is a large multidomain proteoglycan that binds to and cross-links many extracellular matrix components and cell-surface molecules. It has been shown that this protein interacts with laminin, prolargin, collagen type IV, FGFBP1, FBLN2, FGF7 and transthyretin, etc., and it plays essential roles in multiple biological activities. Perlecan is a key component of the vascular extracellular matrix, where it helps to maintain the endothelial barrier function. It is a potent inhibitor of smooth muscle cell proliferation and is thus thought to help maintain vascular homeostasis. It can also promote growth factor (e.g., FGF2) activity and thus stimulate endothelial growth and re-generation. It is a major component of basement membranes, where it is involved in the stabilization of other molecules as well as being involved with glomerular permeability to macromolecules and cell adhesion. Mutations in this gene cause Schwartz-Jampel syndrome type 1, Silverman-Handmaker type of dyssegmental dysplasia, and tardive dyskinesia. Alternative splicing of this gene results in multiple transcript variants. ENSG00000142798 heparan sulfate proteoglycan 2 3339 HSPG2 NA
NA ENSG00000117289 NA NA NA TRUE
This gene encodes a member of the WNT1 inducible signaling pathway (WISP) protein subfamily, which belongs to the connective tissue growth factor (CTGF) family. WNT1 is a member of a family of cysteine-rich, glycosylated signaling proteins that mediate diverse developmental processes. The CTGF family members are characterized by four conserved cysteine-rich domains: insulin-like growth factor-binding domain, von Willebrand factor type C module, thrombospondin domain and C-terminal cystine knot-like (CT) domain. The encoded protein lacks the CT domain which is implicated in dimerization and heparin binding. It is 72% identical to the mouse protein at the amino acid level. This gene may be downstream in the WNT1 signaling pathway that is relevant to malignant transformation. Its expression in colon tumors is reduced while the other two WISP members are overexpressed in colon tumors. It is expressed at high levels in bone tissue, and may play an important role in modulating bone turnover. ENSG00000064205 WNT1 inducible signaling pathway protein 2 8839 WISP2 NA
NA ENSG00000124942 AHNAK nucleoprotein 79026 AHNAK NA
Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. ENSG00000163017 actin, gamma 2, smooth muscle, enteric 72 ACTG2 NA
This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. ENSG00000164692 collagen type I alpha 2 chain 1278 COL1A2 NA
This gene is a member of the aggrecan/versican proteoglycan family. The protein encoded is a large chondroitin sulfate proteoglycan and is a major component of the extracellular matrix. This protein is involved in cell adhesion, proliferation, proliferation, migration and angiogenesis and plays a central role in tissue morphogenesis and maintenance. Mutations in this gene are the cause of Wagner syndrome type 1. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000038427 versican 1462 VCAN NA
The protein encoded by this gene is a major non-neuronal microtubule-associated protein. This protein contains a domain similar to the microtubule-binding domains of neuronal microtubule-associated protein (MAP2) and microtubule-associated protein tau (MAPT/TAU). This protein promotes microtubule assembly, and has been shown to counteract destabilization of interphase microtubule catastrophe promotion. Cyclin B was found to interact with this protein, which targets cell division cycle 2 (CDC2) kinase to microtubules. The phosphorylation of this protein affects microtubule properties and cell cycle progression. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000047849 microtubule associated protein 4 4134 MAP4 NA
This gene encodes a member of the fibulin family of extracellular matrix glycoproteins. Like all members of this family, the encoded protein contains tandemly repeated epidermal growth factor-like repeats followed by a C-terminus fibulin-type domain. This gene is upregulated in malignant gliomas and may play a role in the aggressive nature of these tumors. Mutations in this gene are associated with Doyne honeycomb retinal dystrophy. Alternatively spliced transcript variants that encode the same protein have been described. ENSG00000115380 EGF containing fibulin like extracellular matrix protein 1 2202 EFEMP1 NA
APM2 gene is exclusively expressed in adipose tissue. Its function is currently unknown. ENSG00000148671 adipogenesis regulatory factor 10974 ADIRF NA
Fibromodulin belongs to the family of small interstitial proteoglycans. The encoded protein possesses a central region containing leucine-rich repeats with 4 keratan sulfate chains, flanked by terminal domains containing disulphide bonds. Owing to the interaction with type I and type II collagen fibrils and in vitro inhibition of fibrillogenesis, the encoded protein may play a role in the assembly of extracellular matrix. It may also regulate TGF-beta activities by sequestering TGF-beta into the extracellular matrix. Sequence variations in this gene may be associated with the pathogenesis of high myopia. Alternative splicing results in multiple transcript variants. ENSG00000122176 fibromodulin 2331 FMOD NA
This gene encodes a major glucose transporter in the mammalian blood-brain barrier. The encoded protein is found primarily in the cell membrane and on the cell surface, where it can also function as a receptor for human T-cell leukemia virus (HTLV) I and II. Mutations in this gene have been found in a family with paroxysmal exertion-induced dyskinesia. ENSG00000117394 solute carrier family 2 member 1 6513 SLC2A1 NA
This gene encodes cytochrome b5 reductase, which includes a membrane-bound form in somatic cells (anchored in the endoplasmic reticulum, mitochondrial and other membranes) and a soluble form in erythrocytes. The membrane-bound form exists mainly on the cytoplasmic side of the endoplasmic reticulum and functions in desaturation and elongation of fatty acids, in cholesterol biosynthesis, and in drug metabolism. The erythrocyte form is located in a soluble fraction of circulating erythrocytes and is involved in methemoglobin reduction. The membrane-bound form has both membrane-binding and catalytic domains, while the soluble form has only the catalytic domain. Alternate splicing results in multiple transcript variants. Mutations in this gene cause methemoglobinemias. ENSG00000100243 cytochrome b5 reductase 3 1727 CYB5R3 NA
The protein encoded by this gene belongs to the thrombospondin family. It is a disulfide-linked homotrimeric glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein has been shown to function as a potent inhibitor of tumor growth and angiogenesis. Studies of the mouse counterpart suggest that this protein may modulate the cell surface properties of mesenchymal cells and be involved in cell adhesion and migration. ENSG00000186340 thrombospondin 2 7058 THBS2 NA
Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. ENSG00000109846 crystallin alpha B 1410 CRYAB NA
This gene encodes a member of the serine proteinase inhibitor (serpin) superfamily. This member is the principal inhibitor of tissue plasminogen activator (tPA) and urokinase (uPA), and hence is an inhibitor of fibrinolysis. Defects in this gene are the cause of plasminogen activator inhibitor-1 deficiency (PAI-1 deficiency), and high concentrations of the gene product are associated with thrombophilia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000106366 serpin family E member 1 5054 SERPINE1 NA
This gene encodes a protein involved in peripheral nerve myelin upkeep. The encoded protein contains 2 PDZ domains which were named after PSD95 (post synaptic density protein), DlgA (Drosophila disc large tumor suppressor), and ZO1 (a mammalian tight junction protein). Two alternatively spliced transcript variants have been described for this gene which encode different protein isoforms and which are targeted differently in the Schwann cell. Mutations in this gene cause Charcot-Marie-Tooth neuoropathy, type 4F and Dejerine-Sottas neuropathy. ENSG00000105227 periaxin 57716 PRX NA
NA ENSG00000176658 myosin ID 4642 MYO1D NA
Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, alpha actinin isoform which is concentrated in the cytoplasm, and thought to be involved in metastatic processes. Mutations in this gene have been associated with focal and segmental glomerulosclerosis. ENSG00000130402 actinin alpha 4 81 ACTN4 NA
NA ENSG00000119280 chromosome 1 open reading frame 198 84886 C1orf198 NA
The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000170477 keratin 4 3851 KRT4 NA
NA ENSG00000082781 integrin subunit beta 5 3693 ITGB5 NA
NA ENSG00000167779 insulin like growth factor binding protein 6 3489 IGFBP6 NA
NA ENSG00000099994 sushi domain containing 2 56241 SUSD2 NA
This gene encodes a receptor for inositol 1,4,5-trisphosphate, a second messenger that mediates the release of intracellular calcium. The receptor contains a calcium channel at the C-terminus and the ligand-binding site at the N-terminus. Knockout studies in mice suggest that type 2 and type 3 inositol 1,4,5-trisphosphate receptors play a key role in exocrine secretion underlying energy metabolism and growth. ENSG00000096433 inositol 1,4,5-trisphosphate receptor type 3 3710 ITPR3 NA
This gene encodes a member of the A1 family of peptidases. The encoded preproprotein is proteolytically processed to generate multiple protein products. These products include the cathepsin D light and heavy chains, which heterodimerize to form the mature enzyme. This enzyme exhibits pepsin-like activity and plays a role in protein turnover and in the proteolytic activation of hormones and growth factors. Mutations in this gene play a causal role in neuronal ceroid lipofuscinosis-10 and may be involved in the pathogenesis of several other diseases, including breast cancer and possibly Alzheimer’s disease. ENSG00000117984 cathepsin D 1509 CTSD NA
The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. ENSG00000107317 prostaglandin D2 synthase 5730 PTGDS NA
This gene is a member of the matrix metalloproteinase (MMP) gene family, that are zinc-dependent enzymes capable of cleaving components of the extracellular matrix and molecules involved in signal transduction. The protein encoded by this gene is a gelatinase A, type IV collagenase, that contains three fibronectin type II repeats in its catalytic site that allow binding of denatured type IV and V collagen and elastin. Unlike most MMP family members, activation of this protein can occur on the cell membrane. This enzyme can be activated extracellularly by proteases, or, intracellulary by its S-glutathiolation with no requirement for proteolytical removal of the pro-domain. This protein is thought to be involved in multiple pathways including roles in the nervous system, endometrial menstrual breakdown, regulation of vascularization, and metastasis. Mutations in this gene have been associated with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000087245 matrix metallopeptidase 2 4313 MMP2 NA
This gene encodes a member of the low-density lipoprotein receptor family of proteins. The encoded preproprotein is proteolytically processed by furin to generate 515 kDa and 85 kDa subunits that form the mature receptor (PMID: 8546712). This receptor is involved in several cellular processes, including intracellular signaling, lipid homeostasis, and clearance of apoptotic cells. In addition, the encoded protein is necessary for the alpha 2-macroglobulin-mediated clearance of secreted amyloid precursor protein and beta-amyloid, the main component of amyloid plaques found in Alzheimer patients. Expression of this gene decreases with age and has been found to be lower than controls in brain tissue from Alzheimer’s disease patients. ENSG00000123384 LDL receptor related protein 1 4035 LRP1 NA
This gene encodes a protein containing a calponin homology (CH) domain, a PDZ domain, and a LIM domain, and may be involved in protein-protein interactions. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, however, the full-length nature of some variants is not known. ENSG00000136153 LIM domain 7 4008 LMO7 NA
This gene encodes one of the two alpha chains of type VIII collagen. The gene product is a short chain collagen and a major component of the basement membrane of the corneal endothelium. The type VIII collagen fibril can be either a homo- or a heterotrimer. Alternatively spliced transcript variants encoding the same protein have been observed. ENSG00000144810 collagen type VIII alpha 1 1295 COL8A1 NA
This gene encodes a transmembrane protein containing six cysteine-rich repeat domains and an insulin-like growth factor-binding domain. The encoded protein may play a role in tissue development though interactions with members of the transforming growth factor beta family, such as bone morphogenetic proteins. ENSG00000150938 cysteine rich transmembrane BMP regulator 1 (chordin-like) 51232 CRIM1 NA
Syntrophins are cytoplasmic peripheral membrane scaffold proteins that are components of the dystrophin-associated protein complex. This gene is a member of the syntrophin gene family and encodes the most common syntrophin isoform found in cardiac tissues. The N-terminal PDZ domain of this syntrophin protein interacts with the C-terminus of the pore-forming alpha subunit (SCN5A) of the cardiac sodium channel Nav1.5. This protein also associates cardiac sodium channels with the nitric oxide synthase-PMCA4b (plasma membrane Ca-ATPase subtype 4b) complex in cardiomyocytes. This gene is a susceptibility locus for Long-QT syndrome (LQT) - an inherited disorder associated with sudden cardiac death from arrhythmia - and sudden infant death syndrome (SIDS). This protein also associates with dystrophin and dystrophin-related proteins at the neuromuscular junction and alters intracellular calcium ion levels in muscle tissue. ENSG00000101400 syntrophin alpha 1 6640 SNTA1 NA
Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. ENSG00000077522 actinin alpha 2 88 ACTN2 NA
ERRFI1 is a cytoplasmic protein whose expression is upregulated with cell growth (Wick et al., 1995 [PubMed 7641805]). It shares significant homology with the protein product of rat gene-33, which is induced during cell stress and mediates cell signaling (Makkinje et al., 2000 [PubMed 10749885]; Fiorentino et al., 2000 [PubMed 11003669]). ENSG00000116285 ERBB receptor feedback inhibitor 1 54206 ERRFI1 NA
The protein encoded by this gene is a member of a family of membrane glycoproteins. This glycoprotein provides selectins with carbohydrate ligands. It may also play a role in tumor cell metastasis. ENSG00000185896 lysosomal associated membrane protein 1 3916 LAMP1 NA
This gene encodes a member of the epidermal growth factor (EGF) receptor family of receptor tyrosine kinases. This protein has no ligand binding domain of its own and therefore cannot bind growth factors. However, it does bind tightly to other ligand-bound EGF receptor family members to form a heterodimer, stabilizing ligand binding and enhancing kinase-mediated activation of downstream signalling pathways, such as those involving mitogen-activated protein kinase and phosphatidylinositol-3 kinase. Allelic variations at amino acid positions 654 and 655 of isoform a (positions 624 and 625 of isoform b) have been reported, with the most common allele, Ile654/Ile655, shown here. Amplification and/or overexpression of this gene has been reported in numerous cancers, including breast and ovarian tumors. Alternative splicing results in several additional transcript variants, some encoding different isoforms and others that have not been fully characterized. ENSG00000141736 erb-b2 receptor tyrosine kinase 2 2064 ERBB2 NA
This gene encodes a PDZ domain-containing protein. PDZ motifs are modular protein-protein interaction domains consisting of 80-120 amino acid residues. PDZ domain-containing proteins interact with each other in cytoskeletal assembly or with other proteins involved in targeting and clustering of membrane proteins. The protein encoded by this gene interacts with alpha-actinin-2 through its N-terminal PDZ domain and with protein kinase C via its C-terminal LIM domains. The LIM domain is a cysteine-rich motif defined by 50-60 amino acids containing two zinc-binding modules. This protein also interacts with all three members of the myozenin family. Mutations in this gene have been associated with myofibrillar myopathy and dilated cardiomyopathy. Alternatively spliced transcript variants encoding different isoforms have been identified; all isoforms have N-terminal PDZ domains while only longer isoforms (1, 2 and 5) have C-terminal LIM domains. ENSG00000122367 LIM domain binding 3 11155 LDB3 NA
This gene encodes a protein that catalyzes the condensation of nicotinamide with 5-phosphoribosyl-1-pyrophosphate to yield nicotinamide mononucleotide, one step in the biosynthesis of nicotinamide adenine dinucleotide. The protein belongs to the nicotinic acid phosphoribosyltransferase (NAPRTase) family and is thought to be involved in many important biological processes, including metabolism, stress response and aging. This gene has a pseudogene on chromosome 10. ENSG00000105835 nicotinamide phosphoribosyltransferase 10135 NAMPT NA
The protein encoded by this gene was identified as a binding protein of the protein kinase C, delta (PRKCD). The expression of this gene in cultured cell lines is strongly induced by serum starvation. The expression of this protein was found to be down-regulated in various cancer cell lines, suggesting the possible tumor suppressor function of this protein. ENSG00000170955 protein kinase C delta binding protein 112464 PRKCDBP NA
This gene encodes an integral membrane protein that is a major component of myelin in the peripheral nervous system. Studies suggest two alternately used promoters drive tissue-specific expression. Various mutations of this gene are causes of Charcot-Marie-Tooth disease Type IA, Dejerine-Sottas syndrome, and hereditary neuropathy with liability to pressure palsies. Alternative splicing results in multiple transcript variants. ENSG00000109099 peripheral myelin protein 22 5376 PMP22 NA
NA ENSG00000151468 coiled-coil domain containing 3 83643 CCDC3 NA
The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intracellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the ABC1 subfamily. Members of the ABC1 subfamily comprise the only major ABC subfamily found exclusively in multicellular eukaryotes. This protein is highly expressed in brain tissue and may play a role in macrophage lipid metabolism and neural development. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000107331 ATP binding cassette subfamily A member 2 20 ABCA2 NA
The protein encoded by this gene is a member of the serpin family of proteinase inhibitors. Members of this family maintain homeostasis by neutralizing overexpressed proteinase activity through their function as suicide substrates. This protein inhibits the neutrophil-derived proteinases neutrophil elastase, cathepsin G, and proteinase-3 and thus protects tissues from damage at inflammatory sites. Alternative splicing results in multiple transcript variants. ENSG00000021355 serpin family B member 1 1992 SERPINB1 NA
Secreted frizzled-related protein 4 (SFRP4) is a member of the SFRP family that contains a cysteine-rich domain homologous to the putative Wnt-binding site of Frizzled proteins. SFRPs act as soluble modulators of Wnt signaling. The expression of SFRP4 in ventricular myocardium correlates with apoptosis related gene expression. ENSG00000106483 secreted frizzled related protein 4 6424 SFRP4 NA
NA ENSG00000091986 coiled-coil domain containing 80 151887 CCDC80 NA
This antimicrobial gene belongs to the cytokine gene family which encode secreted proteins involved in immunoregulatory and inflammatory processes. The protein encoded by this gene is structurally related to the CXC (Cys-X-Cys) subfamily of cytokines. Members of this subfamily are characterized by two cysteines separated by a single amino acid. This cytokine displays chemotactic activity for monocytes but not for lymphocytes, dendritic cells, neutrophils or macrophages. It has been implicated that this cytokine is involved in the homeostasis of monocyte-derived macrophages rather than in inflammation. ENSG00000145824 C-X-C motif chemokine ligand 14 9547 CXCL14 NA
This gene encodes a nonsarcomeric myosin regulatory light chain. This protein is activated by phosphorylation and regulates smooth muscle and non-muscle cell contraction. This protein may also be involved in DNA damage repair by sequestering the transcriptional regulator apoptosis-antagonizing transcription factor (AATF)/Che-1 which functions as a repressor of p53-driven apoptosis. Alternate splicing results in multiple transcript variants. A pseudogene of this gene is found on chromosome 8. ENSG00000101608 myosin light chain 12A 10627 MYL12A NA
NA ENSG00000136205 tensin 3 64759 TNS3 NA
This gene encodes a protein that enables the dissociation of paused ternary polymerase I transcription complexes from the 3’ end of pre-rRNA transcripts. This protein regulates rRNA transcription by promoting the dissociation of transcription complexes and the reinitiation of polymerase I on nascent rRNA transcripts. This protein also localizes to caveolae at the plasma membrane and is thought to play a critical role in the formation of caveolae and the stabilization of caveolins. This protein translocates from caveolae to the cytoplasm after insulin stimulation. Caveolae contain truncated forms of this protein and may be the site of phosphorylation-dependent proteolysis. This protein is also thought to modify lipid metabolism and insulin-regulated gene expression. Mutations in this gene result in a disorder characterized by generalized lipodystrophy and muscular dystrophy. ENSG00000177469 polymerase I and transcript release factor 284119 PTRF NA
Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. ENSG00000163631 albumin 213 ALB NA
NA ENSG00000163209 small proline rich protein 3 6707 SPRR3 NA
Kruppel-like factors (KLFs) are a family of broadly expressed zinc finger transcription factors. KLF2 regulates T-cell trafficking by promoting expression of the lipid-binding receptor S1P1 (S1PR1; MIM 601974) and the selectin CD62L (SELL; MIM 153240) (summary by Weinreich et al., 2009 [PubMed 19592277]). ENSG00000127528 Kruppel like factor 2 10365 KLF2 NA
This gene encodes a Plekstrin homology and SEC7 domains-containing protein that functions as a guanine nucleotide exchange factor. The encoded protein regulates signal transduction by activating ADP-ribosylation factor 6. Alternative splicing results in multiple transcript variants. ENSG00000059915 pleckstrin and Sec7 domain containing 5662 PSD NA
The protein encoded by this gene binds to the ‘plus’ ends of actin monomers and filaments to prevent monomer exchange. The encoded calcium-regulated protein functions in both assembly and disassembly of actin filaments. Defects in this gene are a cause of familial amyloidosis Finnish type (FAF). Multiple transcript variants encoding several different isoforms have been found for this gene. ENSG00000148180 gelsolin 2934 GSN NA
This gene encodes a member of the chaperonin family. The encoded mitochondrial protein may function as a signaling molecule in the innate immune system. This protein is essential for the folding and assembly of newly imported proteins in the mitochondria. This gene is adjacent to a related family member and the region between the 2 genes functions as a bidirectional promoter. Several pseudogenes have been associated with this gene. Two transcript variants encoding the same protein have been identified for this gene. Mutations associated with this gene cause autosomal recessive spastic paraplegia 13. ENSG00000144381 heat shock protein family D (Hsp60) member 1 3329 HSPD1 NA
This gene encodes an SH3 domain-containing adaptor protein. The presence of SH3 domains play a role in this protein’s ability to bind other cytoplasmic molecules and contribute to cystoskeletal organization, cell adhesion and migration, signaling, and gene expression. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000120896 sorbin and SH3 domain containing 3 10174 SORBS3 NA
This gene shares both structural and functional similarities with the dystrophin gene. It contains an actin-binding N-terminus, a triple coiled-coil repeat central region, and a C-terminus that consists of protein-protein interaction motifs which interact with dystroglycan protein components. The protein encoded by this gene is located at the neuromuscular synapse and myotendinous junctions, where it participates in post-synaptic membrane maintenance and acetylcholine receptor clustering. Mouse studies suggest that this gene may serve as a functional substitute for the dystrophin gene and therefore, may serve as a potential therapeutic alternative to muscular dystrophy which is caused by mutations in the dystrophin gene. Alternative splicing of the utrophin gene has been described; however, the full-length nature of these variants has not yet been determined. ENSG00000152818 utrophin 7402 UTRN NA
NA ENSG00000229732 NA ENSG00000229732 AC019349.5 NA
Integrins are integral transmembrane glycoproteins composed of noncovalently linked alpha and beta chains. They participate in cell adhesion as well as cell-surface mediated signalling. This gene encodes an integrin alpha chain and is expressed at high levels in chondrocytes, where it is transcriptionally regulated by AP-2epsilon and Ets-1. The protein encoded by this gene binds to collagen. Alternative splicing results in multiple transcript variants. ENSG00000143127 integrin subunit alpha 10 8515 ITGA10 NA
This gene encodes a beta integrin-related protein that is a member of the EGF-like protein family. The encoded protein contains integrin-like cysteine-rich repeats. Alternative splicing results in multiple transcript variants. ENSG00000198542 integrin subunit beta like 1 9358 ITGBL1 NA
This gene encodes the alpha chain of type XVI collagen, a member of the FACIT collagen family (fibril-associated collagens with interrupted helices). Members of this collagen family are found in association with fibril-forming collagens such as type I and II, and serve to maintain the integrity of the extracellular matrix. High levels of type XVI collagen have been found in fibroblasts and keratinocytes, and in smooth muscle and amnion. ENSG00000084636 collagen type XVI alpha 1 chain 1307 COL16A1 NA
This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. ENSG00000175084 desmin 1674 DES NA
This gene encodes an enzyme that oxidizes methionine residues on actin, thereby promoting depolymerization of actin filaments. This protein interacts with and regulates signalling by NEDD9/CAS-L (neural precursor cell expressed, developmentally down-regulated 9). Alternative splicing results in multiple transcript variants. ENSG00000135596 microtubule associated monooxygenase, calponin and LIM domain containing 1 64780 MICAL1 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",12,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 13 Annotations

out <- mygene::queryMany(gene_list[13,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query X_id name summary symbol notfound
ENSG00000107796 59 actin, alpha 2, smooth muscle, aorta The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. ACTA2 NA
ENSG00000149591 6876 transgelin The protein encoded by this gene is a transformation and shape-change sensitive actin cross-linking/gelling protein found in fibroblasts and smooth muscle. Its expression is down-regulated in many cell lines, and this down-regulation may be an early and sensitive marker for the onset of transformation. A functional role of this protein is unclear. Two transcript variants encoding the same protein have been found for this gene. TAGLN NA
ENSG00000180139 ENSG00000180139 ACTA2 antisense RNA 1 NA ACTA2-AS1 NA
ENSG00000075624 60 actin, beta This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. ACTB NA
ENSG00000171401 3860 keratin 13 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. KRT13 NA
ENSG00000186395 3858 keratin 10 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. KRT10 NA
ENSG00000172867 3849 keratin 2 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. KRT2 NA
ENSG00000125868 11034 destrin, actin depolymerizing factor The product of this gene belongs to the actin-binding proteins ADF family. This family of proteins is responsible for enhancing the turnover rate of actin in vivo. This gene encodes the actin depolymerizing protein that severs actin filaments (F-actin) and binds to actin monomers (G-actin). Two transcript variants encoding distinct isoforms have been identified for this gene. DSTN NA
ENSG00000145012 4026 LIM domain containing preferred translocation partner in lipoma This gene encodes a member of a subfamily of LIM domain proteins that are characterized by an N-terminal proline-rich region and three C-terminal LIM domains. The encoded protein localizes to the cell periphery in focal adhesions and may be involved in cell-cell adhesion and cell motility. This protein also shuttles through the nucleus and may function as a transcriptional co-activator. This gene is located at the junction of certain disease-related chromosomal translocations, which result in the expression of chimeric proteins that may promote tumor growth. Alternative splicing results in multiple transcript variants. LPP NA
ENSG00000096696 1832 desmoplakin This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants. DSP NA
ENSG00000163431 25802 leiomodin 1 The leiomodin 1 protein has a putative membrane-spanning region and 2 types of tandemly repeated blocks. The transcript is expressed in all tissues tested, with the highest levels in thyroid, eye muscle, skeletal muscle, and ovary. Increased expression of leiomodin 1 may be linked to Graves’ disease and thyroid-associated ophthalmopathy. LMOD1 NA
ENSG00000115386 5967 regenerating family member 1 alpha This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. REG1A NA
ENSG00000148600 92211 cadherin related family member 1 This gene belongs to the cadherin superfamily of calcium-dependent cell adhesion molecules. The encoded protein is a photoreceptor-specific cadherin that plays a role in outer segment disc morphogenesis. Mutations in this gene are associated with inherited retinal dystrophies. Alternatively spliced transcript variants encoding different isoforms have been identified. CDHR1 NA
ENSG00000143248 8490 regulator of G-protein signaling 5 This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. RGS5 NA
ENSG00000167768 3848 keratin 1 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. KRT1 NA
ENSG00000225630 ENSG00000225630 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28 NA MTND2P28 NA
ENSG00000133392 4629 myosin, heavy chain 11, smooth muscle The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. MYH11 NA
ENSG00000175084 1674 desmin This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. DES NA
ENSG00000229732 ENSG00000229732 NA NA AC019349.5 NA
ENSG00000171476 84525 HOP homeobox The protein encoded by this gene is a homeodomain protein that lacks certain conserved residues required for DNA binding. It was reported that choriocarcinoma cell lines and tissues failed to express this gene, which suggested the possible involvement of this gene in malignant conversion of placental trophoblasts. Studies in mice suggest that this protein may interact with serum response factor (SRF) and modulate SRF-dependent cardiac-specific gene expression and cardiac development. Multiple alternatively spliced transcript variants have been identified for this gene. HOPX NA
ENSG00000184009 71 actin gamma 1 Actins are highly conserved proteins that are involved in various types of cell motility, and maintenance of the cytoskeleton. In vertebrates, three main groups of actin isoforms, alpha, beta and gamma have been identified. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton, and as mediators of internal cell motility. Actin, gamma 1, encoded by this gene, is a cytoplasmic actin found in non-muscle cells. Mutations in this gene are associated with DFNA20/26, a subtype of autosomal dominant non-syndromic sensorineural progressive hearing loss. Alternative splicing results in multiple transcript variants. ACTG1 NA
ENSG00000167641 94274 protein phosphatase 1 regulatory inhibitor subunit 14A The protein encoded by this gene belongs to the protein phosphatase 1 (PP1) inhibitor family. This protein is an inhibitor of smooth muscle myosin phosphatase, and has higher inhibitory activity when phosphorylated. Inhibition of myosin phosphatase leads to increased myosin phosphorylation and enhanced smooth muscle contraction. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. PPP1R14A NA
ENSG00000072952 10335 murine retrovirus integration site 1 homolog This gene is similar to a putative mouse tumor suppressor gene (Mrvi1) that is frequently disrupted by mouse AIDS-related virus (MRV). The encoded protein, which is found in the membrane of the endoplasmic reticulum, is similar to Jaw1, a lymphoid-restricted protein whose expression is down-regulated during lymphoid differentiation. This protein is a substrate of cGMP-dependent kinase-1 (PKG1) that can function as a regulator of IP3-induced calcium release. Studies in mouse suggest that MRV integration at Mrvi1 induces myeloid leukemia by altering the expression of a gene important for myeloid cell growth and/or differentiation, and thus this gene may function as a myeloid leukemia tumor suppressor gene. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, and alternative translation start sites, including a non-AUG (CUG) start site, are used. MRVI1 NA
ENSG00000079308 7145 tensin 1 The protein encoded by this gene localizes to focal adhesions, regions of the plasma membrane where the cell attaches to the extracellular matrix. This protein crosslinks actin filaments and contains a Src homology 2 (SH2) domain, which is often found in molecules involved in signal transduction. This protein is a substrate of calpain II. Alternative splicing results in multiple transcript variants encoding different isoforms. TNS1 NA
ENSG00000145824 9547 C-X-C motif chemokine ligand 14 This antimicrobial gene belongs to the cytokine gene family which encode secreted proteins involved in immunoregulatory and inflammatory processes. The protein encoded by this gene is structurally related to the CXC (Cys-X-Cys) subfamily of cytokines. Members of this subfamily are characterized by two cysteines separated by a single amino acid. This cytokine displays chemotactic activity for monocytes but not for lymphocytes, dendritic cells, neutrophils or macrophages. It has been implicated that this cytokine is involved in the homeostasis of monocyte-derived macrophages rather than in inflammation. CXCL14 NA
ENSG00000170477 3851 keratin 4 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. KRT4 NA
ENSG00000116133 1718 24-dehydrocholesterol reductase This gene encodes a flavin adenine dinucleotide (FAD)-dependent oxidoreductase which catalyzes the reduction of the delta-24 double bond of sterol intermediates during cholesterol biosynthesis. The protein contains a leader sequence that directs it to the endoplasmic reticulum membrane. Missense mutations in this gene have been associated with desmosterolosis. Also, reduced expression of the gene occurs in the temporal cortex of Alzheimer disease patients and overexpression has been observed in adrenal gland cancer cells. DHCR24 NA
ENSG00000198467 7169 tropomyosin 2 (beta) This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. TPM2 NA
ENSG00000115414 2335 fibronectin 1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. FN1 NA
ENSG00000148795 1586 cytochrome P450 family 17 subfamily A member 1 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. It has both 17alpha-hydroxylase and 17,20-lyase activities and is a key enzyme in the steroidogenic pathway that produces progestins, mineralocorticoids, glucocorticoids, androgens, and estrogens. Mutations in this gene are associated with isolated steroid-17 alpha-hydroxylase deficiency, 17-alpha-hydroxylase/17,20-lyase deficiency, pseudohermaphroditism, and adrenal hyperplasia. CYP17A1 NA
ENSG00000128591 2318 filamin C This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. FLNC NA
ENSG00000196616 125 alcohol dehydrogenase 1B (class I), beta polypeptide The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene. ADH1B NA
ENSG00000157110 11030 RNA binding protein with multiple splicing This gene encodes a member of the RNA recognition motif family of RNA-binding proteins. The RNA recognition motif is between 80-100 amino acids in length and family members contain one to four copies of the motif. The RNA recognition motif consists of two short stretches of conserved sequence, as well as a few highly conserved hydrophobic residues. The encoded protein has a single, putative RNA recognition motif in its N-terminus. Alternative splicing results in multiple transcript variants encoding different isoforms. RBPMS NA
ENSG00000143546 6279 S100 calcium binding protein A8 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and as a cytokine. Altered expression of this protein is associated with the disease cystic fibrosis. Multiple transcript variants encoding different isoforms have been found for this gene. S100A8 NA
ENSG00000159176 1465 cysteine and glycine rich protein 1 This gene encodes a member of the cysteine-rich protein (CSRP) family. This gene family includes a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this gene product occurs in proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Alternatively spliced transcript variants have been described. CSRP1 NA
ENSG00000106809 4969 osteoglycin This gene encodes a member of the small leucine-rich proteoglycan (SLRP) family of proteins. The encoded protein induces ectopic bone formation in conjunction with transforming growth factor beta and may regulate osteoblast differentiation. High expression of the encoded protein may be associated with elevated heart left ventricular mass. Alternative splicing results in multiple transcript variants. OGN NA
ENSG00000163209 6707 small proline rich protein 3 NA SPRR3 NA
ENSG00000197616 4624 myosin, heavy chain 6, cardiac muscle, alpha Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. MYH6 NA
ENSG00000269936 ENSG00000269936 NA NA RP11-394O4.5 NA
ENSG00000049540 2006 elastin This gene encodes a protein that is one of the two components of elastic fibers. The encoded protein is rich in hydrophobic amino acids such as glycine and proline, which form mobile hydrophobic regions bounded by crosslinks between lysine residues. Deletions and mutations in this gene are associated with supravalvular aortic stenosis (SVAS) and autosomal dominant cutis laxa. Multiple transcript variants encoding different isoforms have been found for this gene. ELN NA
ENSG00000011465 1634 decorin This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. DCN NA
ENSG00000203782 4014 loricrin This gene encodes loricrin, a major protein component of the cornified cell envelope found in terminally differentiated epidermal cells. Mutations in this gene are associated with Vohwinkel’s syndrome and progressive symmetric erythrokeratoderma, both inherited skin diseases. LOR NA
ENSG00000257017 3240 haptoglobin This gene encodes a preproprotein, which is processed to yield both alpha and beta chains, which subsequently combine as a tetramer to produce haptoglobin. Haptoglobin functions to bind free plasma hemoglobin, which allows degradative enzymes to gain access to the hemoglobin, while at the same time preventing loss of iron through the kidneys and protecting the kidneys from damage by hemoglobin. Mutations in this gene and/or its regulatory regions cause ahaptoglobinemia or hypohaptoglobinemia. This gene has also been linked to diabetic nephropathy, the incidence of coronary artery disease in type 1 diabetes, Crohn’s disease, inflammatory disease behavior, primary sclerosing cholangitis, susceptibility to idiopathic Parkinson’s disease, and a reduced incidence of Plasmodium falciparum malaria. The protein encoded also exhibits antimicrobial activity against bacteria. A similar duplicated gene is located next to this gene on chromosome 16. Multiple transcript variants encoding different isoforms have been found for this gene. HP NA
ENSG00000106123 2051 EPH receptor B6 This gene encodes a member of a family of transmembrane proteins that function as receptors for ephrin-B family proteins. Unlike other members of this family, the encoded protein does not contain a functional kinase domain. Activity of this protein can influence cell adhesion and migration. Expression of this gene is downregulated during tumor progression, suggesting that the protein may suppress tumor invasion and metastasis. Alternative splicing results in multiple transcript variants. EPHB6 NA
ENSG00000111341 4256 matrix Gla protein The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. MGP NA
ENSG00000077943 8516 integrin subunit alpha 8 Integrins are heterodimeric transmembrane receptor proteins that mediate numerous cellular processes including cell adhesion, cytoskeletal rearrangement, and activation of cell signaling pathways. Integrins are composed of alpha and beta subunits. This gene encodes the alpha 8 subunit of the heterodimeric integrin alpha8beta1 protein. The encoded protein is a single-pass type 1 membrane protein that contains multiple FG-GAP repeats. This repeat is predicted to fold into a beta propeller structure. This gene regulates the recruitment of mesenchymal cells into epithelial structures, mediates cell-cell interactions, and regulates neurite outgrowth of sensory and motor neurons. The integrin alpha8beta1 protein thus plays an important role in wound-healing and organogenesis. Mutations in this gene have been associated with renal hypodysplasia/aplasia-1 (RHDA1) and with several animal models of chronic kidney disease. Alternate splicing results in multiple transcript variants encoding distinct isoforms. ITGA8 NA
ENSG00000204388 3304 heat shock protein family A (Hsp70) member 1B This intronless gene encodes a 70kDa heat shock protein which is a member of the heat shock protein 70 family. In conjuction with other heat shock proteins, this protein stabilizes existing proteins against aggregation and mediates the folding of newly translated proteins in the cytosol and in organelles. It is also involved in the ubiquitin-proteasome pathway through interaction with the AU-rich element RNA-binding protein 1. The gene is located in the major histocompatibility complex class III region, in a cluster with two closely related genes which encode similar proteins. HSPA1B NA
ENSG00000080573 50509 collagen type V alpha 3 This gene encodes an alpha chain for one of the low abundance fibrillar collagens. Fibrillar collagen molecules are trimers that can be composed of one or more types of alpha chains. Type V collagen is found in tissues containing type I collagen and appears to regulate the assembly of heterotypic fibers composed of both type I and type V collagen. This gene product is closely related to type XI collagen and it is possible that the collagen chains of types V and XI constitute a single collagen type with tissue-specific chain combinations. Mutations in this gene are thought to be responsible for the symptoms of a subset of patients with Ehlers-Danlos syndrome type III. Messages of several sizes can be detected in northern blots but sequence information cannot confirm the identity of the shorter messages. COL5A3 NA
ENSG00000197746 5660 prosaposin This gene encodes a highly conserved preproprotein that is proteolytically processed to generate four main cleavage products including saposins A, B, C, and D. Each domain of the precursor protein is approximately 80 amino acid residues long with nearly identical placement of cysteine residues and glycosylation sites. Saposins A-D localize primarily to the lysosomal compartment where they facilitate the catabolism of glycosphingolipids with short oligosaccharide groups. The precursor protein exists both as a secretory protein and as an integral membrane protein and has neurotrophic activities. Mutations in this gene have been associated with Gaucher disease and metachromatic leukodystrophy. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. PSAP NA
ENSG00000178585 56998 catenin beta interacting protein 1 The protein encoded by this gene binds CTNNB1 and prevents interaction between CTNNB1 and TCF family members. The encoded protein is a negative regulator of the Wnt signaling pathway. Two transcript variants encoding the same protein have been found for this gene. CTNNBIP1 NA
ENSG00000068078 2261 fibroblast growth factor receptor 3 This gene encodes a member of the fibroblast growth factor receptor (FGFR) family, with its amino acid sequence being highly conserved between members and among divergent species. FGFR family members differ from one another in their ligand affinities and tissue distribution. A full-length representative protein would consist of an extracellular region, composed of three immunoglobulin-like domains, a single hydrophobic membrane-spanning segment and a cytoplasmic tyrosine kinase domain. The extracellular portion of the protein interacts with fibroblast growth factors, setting in motion a cascade of downstream signals, ultimately influencing mitogenesis and differentiation. This particular family member binds acidic and basic fibroblast growth hormone and plays a role in bone development and maintenance. Mutations in this gene lead to craniosynostosis and multiple types of skeletal dysplasia. Three alternatively spliced transcript variants that encode different protein isoforms have been described. FGFR3 NA
ENSG00000163631 213 albumin Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. ALB NA
ENSG00000087266 6452 SH3 domain binding protein 2 The protein encoded by this gene has an N-terminal pleckstrin homology (PH) domain, an SH3-binding proline-rich region, and a C-terminal SH2 domain. The protein binds to the SH3 domains of several proteins including the ABL1 and SYK protein tyrosine kinases , and functions as a cytoplasmic adaptor protein to positively regulate transcriptional activity in T, natural killer (NK), and basophilic cells. Mutations in this gene result in cherubism. Multiple transcript variants encoding different isoforms have been found for this gene. SH3BP2 NA
ENSG00000259627 ENSG00000259627 NA NA RP11-244F12.2 NA
ENSG00000137857 53905 dual oxidase 1 The protein encoded by this gene is a glycoprotein and a member of the NADPH oxidase family. The synthesis of thyroid hormone is catalyzed by a protein complex located at the apical membrane of thyroid follicular cells. This complex contains an iodide transporter, thyroperoxidase, and a peroxide generating system that includes proteins encoded by this gene and the similar DUOX2 gene. This protein is known as dual oxidase because it has both a peroxidase homology domain and a gp91phox domain. This protein generates hydrogen peroxide and thereby plays a role in the activity of thyroid peroxidase, lactoperoxidase, and in lactoperoxidase-mediated antimicrobial defense at mucosal surfaces. Two alternatively spliced transcript variants encoding the same protein have been described for this gene. DUOX1 NA
ENSG00000140416 7168 tropomyosin 1 (alpha) This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. TPM1 NA
ENSG00000237973 ENSG00000237973 MT-CO1 pseudogene 12 NA MTCO1P12 NA
ENSG00000178372 51806 calmodulin like 5 This gene encodes a novel calcium binding protein expressed in the epidermis and related to the calmodulin family of calcium binding proteins. Functional studies with recombinant protein demonstrate it does bind calcium and undergoes a conformational change when it does so. Abundant expression is detected only in reconstructed epidermis and is restricted to differentiating keratinocytes. In addition, it can associate with transglutaminase 3, shown to be a key enzyme in the terminal differentiation of keratinocytes. CALML5 NA
ENSG00000172023 5968 regenerating family member 1 beta This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. REG1B NA
ENSG00000159251 70 actin, alpha, cardiac muscle 1 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). ACTC1 NA
ENSG00000065534 4638 myosin light chain kinase This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. MYLK NA
ENSG00000072110 87 actinin alpha 1 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, cytoskeletal, alpha actinin isoform and maps to the same site as the structurally similar erythroid beta spectrin gene. Three transcript variants encoding different isoforms have been found for this gene. ACTN1 NA
ENSG00000077782 2260 fibroblast growth factor receptor 1 The protein encoded by this gene is a member of the fibroblast growth factor receptor (FGFR) family, where amino acid sequence is highly conserved between members and throughout evolution. FGFR family members differ from one another in their ligand affinities and tissue distribution. A full-length representative protein consists of an extracellular region, composed of three immunoglobulin-like domains, a single hydrophobic membrane-spanning segment and a cytoplasmic tyrosine kinase domain. The extracellular portion of the protein interacts with fibroblast growth factors, setting in motion a cascade of downstream signals, ultimately influencing mitogenesis and differentiation. This particular family member binds both acidic and basic fibroblast growth factors and is involved in limb induction. Mutations in this gene have been associated with Pfeiffer syndrome, Jackson-Weiss syndrome, Antley-Bixler syndrome, osteoglophonic dysplasia, and autosomal dominant Kallmann syndrome 2. Chromosomal aberrations involving this gene are associated with stem cell myeloproliferative disorder and stem cell leukemia lymphoma syndrome. Alternatively spliced variants which encode different protein isoforms have been described; however, not all variants have been fully characterized. FGFR1 NA
ENSG00000163017 72 actin, gamma 2, smooth muscle, enteric Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. ACTG2 NA
ENSG00000161634 117159 dermcidin This antimicrobial gene encodes a secreted protein that is subsequently processed into mature peptides of distinct biological activities. The C-terminal peptide is constitutively expressed in sweat and has antibacterial and antifungal activities. The N-terminal peptide, also known as diffusible survival evasion peptide, promotes neural cell survival under conditions of severe oxidative stress. A glycosylated form of the N-terminal peptide may be associated with cachexia (muscle wasting) in cancer patients. Alternative splicing results in multiple transcript variants encoding different isoforms. DCD NA
ENSG00000143126 1952 cadherin EGF LAG seven-pass G-type receptor 2 The protein encoded by this gene is a member of the flamingo subfamily, part of the cadherin superfamily. The flamingo subfamily consists of nonclassic-type cadherins; a subpopulation that does not interact with catenins. The flamingo cadherins are located at the plasma membrane and have nine cadherin domains, seven epidermal growth factor-like repeats and two laminin A G-type repeats in their ectodomain. They also have seven transmembrane domains, a characteristic unique to this subfamily. It is postulated that these proteins are receptors involved in contact-mediated communication, with cadherin domains acting as homophilic binding regions and the EGF-like domains involved in cell adhesion and receptor-ligand interactions. The specific function of this particular member has not been determined. CELSR2 NA
ENSG00000156113 3778 potassium calcium-activated channel subfamily M alpha 1 MaxiK channels are large conductance, voltage and calcium-sensitive potassium channels which are fundamental to the control of smooth muscle tone and neuronal excitability. MaxiK channels can be formed by 2 subunits: the pore-forming alpha subunit, which is the product of this gene, and the modulatory beta subunit. Intracellular calcium regulates the physical association between the alpha and beta subunits. Alternatively spliced transcript variants encoding different isoforms have been identified. KCNMA1 NA
ENSG00000081277 5317 plakophilin 1 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This protein may be involved in molecular recruitment and stabilization during desmosome formation. Mutations in this gene have been associated with the ectodermal dysplasia/skin fragility syndrome. Two transcript variants encoding different isoforms have been found for this gene. PKP1 NA
ENSG00000135046 301 annexin A1 This gene encodes a membrane-localized protein that binds phospholipids. This protein inhibits phospholipase A2 and has anti-inflammatory activity. Loss of function or expression of this gene has been detected in multiple tumors. ANXA1 NA
ENSG00000132470 3691 integrin subunit beta 4 Integrins are heterodimers comprised of alpha and beta subunits, that are noncovalently associated transmembrane glycoprotein receptors. Different combinations of alpha and beta polypeptides form complexes that vary in their ligand-binding specificities. Integrins mediate cell-matrix or cell-cell adhesion, and transduced signals that regulate gene expression and cell growth. This gene encodes the integrin beta 4 subunit, a receptor for the laminins. This subunit tends to associate with alpha 6 subunit and is likely to play a pivotal role in the biology of invasive carcinoma. Mutations in this gene are associated with epidermolysis bullosa with pyloric atresia. Multiple alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. ITGB4 NA
ENSG00000106772 158471 prune homolog 2 The protein encoded by this gene belongs to the B-cell CLL/lymphoma 2 and adenovirus E1B 19 kDa interacting family, whose members play roles in many cellular processes including apotosis, cell transformation, and synaptic function. Several functions for this protein have been demonstrated including suppression of Ras homolog family member A activity, which results in reduced stress fiber formation and suppression of oncogenic cellular transformation. A high molecular weight isoform of this protein has also been shown to colocalize with Adaptor protein complex 2, beta-Adaptin and endodermal markers, suggesting an involvement in post-endocytic trafficking. In prostate cancer cells, this gene acts as a tumor suppressor and its expression is regulated by prostate cancer antigen 3, a non-protein coding gene on the opposite DNA strand in an intron of this gene. Prostate cancer antigen 3 regulates levels of this gene through formation of a double-stranded RNA that undergoes adenosine deaminase actin on RNA-dependent adenosine-to-inosine RNA editing. Alternative splicing results in multiple transcript variants. PRUNE2 NA
ENSG00000113140 6678 secreted protein acidic and cysteine rich This gene encodes a cysteine-rich acidic matrix-associated protein. The encoded protein is required for the collagen in bone to become calcified but is also involved in extracellular matrix synthesis and promotion of changes to cell shape. The gene product has been associated with tumor suppression but has also been correlated with metastasis based on changes to cell shape which can promote tumor cell invasion. Three transcript variants encoding different isoforms have been found for this gene. SPARC NA
ENSG00000138735 8654 phosphodiesterase 5A This gene encodes a cGMP-binding, cGMP-specific phosphodiesterase, a member of the cyclic nucleotide phosphodiesterase family. This phosphodiesterase specifically hydrolyzes cGMP to 5’-GMP. It is involved in the regulation of intracellular concentrations of cyclic nucleotides and is important for smooth muscle relaxation in the cardiovascular system. Alternative splicing of this gene results in three transcript variants encoding distinct isoforms. PDE5A NA
ENSG00000122786 800 caldesmon 1 This gene encodes a calmodulin- and actin-binding protein that plays an essential role in the regulation of smooth muscle and nonmuscle contraction. The conserved domain of this protein possesses the binding activities to Ca(2+)-calmodulin, actin, tropomyosin, myosin, and phospholipids. This protein is a potent inhibitor of the actin-tropomyosin activated myosin MgATPase, and serves as a mediating factor for Ca(2+)-dependent inhibition of smooth muscle contraction. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. CALD1 NA
ENSG00000256309 NA NA NA NA TRUE
ENSG00000103034 65009 NDRG family member 4 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that is required for cell cycle progression and survival in primary astrocytes and may be involved in the regulation of mitogenic signalling in vascular smooth muscles cells. Alternative splicing results in multiple transcripts encoding different isoforms. NDRG4 NA
ENSG00000125730 718 complement component 3 Complement component C3 plays a central role in the activation of complement system. Its activation is required for both classical and alternative complement activation pathways. The encoded preproprotein is proteolytically processed to generate alpha and beta subunits that form the mature protein, which is then further processed to generate numerous peptide products. The C3a peptide, also known as the C3a anaphylatoxin, modulates inflammation and possesses antimicrobial activity. Mutations in this gene are associated with atypical hemolytic uremic syndrome and age-related macular degeneration in human patients. C3 NA
ENSG00000187605 200424 tet methylcytosine dioxygenase 3 Members of the ten-eleven translocation (TET) gene family, including TET3, play a role in the DNA methylation process (Langemeijer et al., 2009 [PubMed 19923888]). TET3 NA
ENSG00000122304 5620 protamine 2 Protamines substitute for histones in the chromatin of sperm during the haploid phase of spermatogenesis, and are the major DNA-binding proteins in the nucleus of sperm in many vertebrates. They package the sperm DNA into a highly condensed complex in a volume less than 5% of a somatic cell nucleus. Many mammalian species have only one protamine (protamine 1); however, a few species, including human and mouse, have two. This gene encodes protamine 2, which is cleaved to give rise to a family of protamine 2 peptides. Alternatively spliced transcript variants have also been found for this gene. PRM2 NA
ENSG00000108828 10493 vesicle amine transport 1 Synaptic vesicles are responsible for regulating the storage and release of neurotransmitters in the nerve terminal. The protein encoded by this gene is an abundant integral membrane protein of cholinergic synaptic vesicles and is thought to be involved in vesicular transport. It belongs to the quinone oxidoreductase subfamily of zinc-containing alcohol dehydrogenase proteins. VAT1 NA
ENSG00000136153 4008 LIM domain 7 This gene encodes a protein containing a calponin homology (CH) domain, a PDZ domain, and a LIM domain, and may be involved in protein-protein interactions. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, however, the full-length nature of some variants is not known. LMO7 NA
ENSG00000186081 3852 keratin 5 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the basal layer of the epidermis with family member KRT14. Mutations in these genes have been associated with a complex of diseases termed epidermolysis bullosa simplex. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. KRT5 NA
ENSG00000173801 3728 junction plakoglobin This gene encodes a major cytoplasmic protein which is the only known constituent common to submembranous plaques of both desmosomes and intermediate junctions. This protein forms distinct complexes with cadherins and desmosomal cadherins and is a member of the catenin family since it contains a distinct repeating amino acid motif called the armadillo repeat. Mutation in this gene has been associated with Naxos disease. Alternative splicing occurs in this gene; however, not all transcripts have been fully described. JUP NA
ENSG00000175183 1466 cysteine and glycine rich protein 2 CSRP2 is a member of the CSRP family of genes, encoding a group of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. CRP2 contains two copies of the cysteine-rich amino acid sequence motif (LIM) with putative zinc-binding activity, and may be involved in regulating ordered cell growth. Other genes in the family include CSRP1 and CSRP3. Alternative splicing results in multiple transcript variants. CSRP2 NA
ENSG00000185201 10581 interferon induced transmembrane protein 2 NA IFITM2 NA
ENSG00000164266 6690 serine peptidase inhibitor, Kazal type 1 The protein encoded by this gene is a trypsin inhibitor, which is secreted from pancreatic acinar cells into pancreatic juice. It is thought to function in the prevention of trypsin-catalyzed premature activation of zymogens within the pancreas and the pancreatic duct. Mutations in this gene are associated with hereditary pancreatitis and tropical calcific pancreatitis. SPINK1 NA
ENSG00000118194 7139 troponin T2, cardiac type The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. TNNT2 NA
ENSG00000009307 7812 cold shock domain containing E1 NA CSDE1 NA
ENSG00000182871 80781 collagen type XVIII alpha 1 chain This gene encodes the alpha chain of type XVIII collagen. This collagen is one of the multiplexins, extracellular matrix proteins that contain multiple triple-helix domains (collagenous domains) interrupted by non-collagenous domains. A long isoform of the protein has an N-terminal domain that is homologous to the extracellular part of frizzled receptors. Proteolytic processing at several endogenous cleavage sites in the C-terminal domain results in production of endostatin, a potent antiangiogenic protein that is able to inhibit angiogenesis and tumor growth. Mutations in this gene are associated with Knobloch syndrome. The main features of this syndrome involve retinal abnormalities, so type XVIII collagen may play an important role in retinal structure and in neural tube closure. Alternative splicing results in multiple transcript variants. COL18A1 NA
ENSG00000155657 7273 titin This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. TTN NA
ENSG00000163754 2992 glycogenin 1 This gene encodes a member of the glycogenin family. Glycogenin is a glycosyltransferase that catalyzes the formation of a short glucose polymer from uridine diphosphate glucose in an autoglucosylation reaction. This reaction is followed by elongation and branching of the polymer, catalyzed by glycogen synthase and branching enzyme, to form glycogen. This gene is expressed in muscle and other tissues. Mutations in this gene result in glycogen storage disease XV. This gene has pseudogenes on chromosomes 1, 8 and 13 respectively. Alternatively spliced transcript variants encoding different isoforms have been identified. GYG1 NA
ENSG00000120885 1191 clusterin The protein encoded by this gene is a secreted chaperone that can under some stress conditions also be found in the cell cytosol. It has been suggested to be involved in several basic biological events such as cell death, tumor progression, and neurodegenerative disorders. Alternate splicing results in both coding and non-coding variants. CLU NA
ENSG00000101335 10398 myosin light chain 9 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. MYL9 NA
ENSG00000109610 6649 superoxide dismutase 3, extracellular This gene encodes a member of the superoxide dismutase (SOD) protein family. SODs are antioxidant enzymes that catalyze the conversion of superoxide radicals into hydrogen peroxide and oxygen, which may protect the brain, lungs, and other tissues from oxidative stress. Proteolytic processing of the encoded protein results in the formation of two distinct homotetramers that differ in their ability to interact with the extracellular matrix (ECM). Homotetramers consisting of the intact protein, or type C subunit, exhibit high affinity for heparin and are anchored to the ECM. Homotetramers consisting of a proteolytically cleaved form of the protein, or type A subunit, exhibit low affinity for heparin and do not interact with the ECM. A mutation in this gene may be associated with increased heart disease risk. SOD3 NA
ENSG00000143536 49860 cornulin This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation. CRNN NA
ENSG00000185532 5592 protein kinase, cGMP-dependent, type I Mammals have three different isoforms of cyclic GMP-dependent protein kinase (Ialpha, Ibeta, and II). These PRKG isoforms act as key mediators of the nitric oxide/cGMP signaling pathway and are important components of many signal transduction processes in diverse cell types. This PRKG1 gene on human chromosome 10 encodes the soluble Ialpha and Ibeta isoforms of PRKG by alternative transcript splicing. A separate gene on human chromosome 4, PRKG2, encodes the membrane-bound PRKG isoform II. The PRKG1 proteins play a central role in regulating cardiovascular and neuronal functions in addition to relaxing smooth muscle tone, preventing platelet aggregation, and modulating cell growth. This gene is most strongly expressed in all types of smooth muscle, platelets, cerebellar Purkinje cells, hippocampal neurons, and the lateral amygdala. Isoforms Ialpha and Ibeta have identical cGMP-binding and catalytic domains but differ in their leucine/isoleucine zipper and autoinhibitory sequences and therefore differ in their dimerization substrates and kinase enzyme activity. PRKG1 NA
ENSG00000132329 10267 receptor activity modifying protein 1 The protein encoded by this gene is a member of the RAMP family of single-transmembrane-domain proteins, called receptor (calcitonin) activity modifying proteins (RAMPs). RAMPs are type I transmembrane proteins with an extracellular N terminus and a cytoplasmic C terminus. RAMPs are required to transport calcitonin-receptor-like receptor (CRLR) to the plasma membrane. CRLR, a receptor with seven transmembrane domains, can function as either a calcitonin-gene-related peptide (CGRP) receptor or an adrenomedullin receptor, depending on which members of the RAMP family are expressed. In the presence of this (RAMP1) protein, CRLR functions as a CGRP receptor. The RAMP1 protein is involved in the terminal glycosylation, maturation, and presentation of the CGRP receptor to the cell surface. Alternative splicing results in multiple transcript variants encoding different isoforms. RAMP1 NA
ENSG00000244734 3043 hemoglobin subunit beta The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. HBB NA
ENSG00000147872 123 perilipin 2 The protein encoded by this gene belongs to the perilipin family, members of which coat intracellular lipid storage droplets. This protein is associated with the lipid globule surface membrane material, and maybe involved in development and maintenance of adipose tissue. However, it is not restricted to adipocytes as previously thought, but is found in a wide range of cultured cell lines, including fibroblasts, endothelial and epithelial cells, and tissues, such as lactating mammary gland, adrenal cortex, Sertoli and Leydig cells, and hepatocytes in alcoholic liver cirrhosis, suggesting that it may serve as a marker of lipid accumulation in diverse cell types and diseases. Alternatively spliced transcript variants have been found for this gene. PLIN2 NA
ENSG00000159069 54461 F-box and WD repeat domain containing 5 This gene encodes a member of the F-box protein family, members of which are characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into three classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene contains WD-40 domains, in addition to an F-box motif, so it belongs to the Fbw class. Alternatively spliced transcript variants encoding distinct isoforms have been identified for this gene, however, they were found to be nonsense-mediated mRNA decay (NMD) candidates, hence not represented. FBXW5 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",13,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 14 Annotations

out <- mygene::queryMany(gene_list[14,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
symbol X_id query name summary
S100A9 6280 ENSG00000163220 S100 calcium binding protein A9 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity.
HBB 3043 ENSG00000244734 hemoglobin subunit beta The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’.
S100A8 6279 ENSG00000143546 S100 calcium binding protein A8 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and as a cytokine. Altered expression of this protein is associated with the disease cystic fibrosis. Multiple transcript variants encoding different isoforms have been found for this gene.
HBA2 3040 ENSG00000188536 hemoglobin subunit alpha 2 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported.
KRT13 3860 ENSG00000171401 keratin 13 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described.
REG1A 5967 ENSG00000115386 regenerating family member 1 alpha This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication.
KRT4 3851 ENSG00000170477 keratin 4 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13.
SVIL 6840 ENSG00000197321 supervillin This gene encodes a bipartite protein with distinct amino- and carboxy-terminal domains. The amino-terminus contains nuclear localization signals and the carboxy-terminus contains numerous consecutive sequences with extensive similarity to proteins in the gelsolin family of actin-binding proteins, which cap, nucleate, and/or sever actin filaments. The gene product is tightly associated with both actin filaments and plasma membranes, suggesting a role as a high-affinity link between the actin cytoskeleton and the membrane. The encoded protein appears to aid in both myosin II assembly during cell spreading and disassembly of focal adhesions. Several transcript variants encoding different isoforms of supervillin have been described.
DES 1674 ENSG00000175084 desmin This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies.
MYO1F 4542 ENSG00000142347 myosin IF NA
C10orf54 64115 ENSG00000107738 chromosome 10 open reading frame 54 NA
CSF3R 1441 ENSG00000119535 colony stimulating factor 3 receptor The protein encoded by this gene is the receptor for colony stimulating factor 3, a cytokine that controls the production, differentiation, and function of granulocytes. The encoded protein, which is a member of the family of cytokine receptors, may also function in some cell surface adhesion or recognition processes. Alternatively spliced transcript variants have been described. Mutations in this gene are a cause of Kostmann syndrome, also known as severe congenital neutropenia.
CKM 1158 ENSG00000104879 creatine kinase, M-type The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family.
MKNK2 2872 ENSG00000099875 MAP kinase interacting serine/threonine kinase 2 This gene encodes a member of the calcium/calmodulin-dependent protein kinases (CAMK) Ser/Thr protein kinase family, which belongs to the protein kinase superfamily. This protein contains conserved DLG (asp-leu-gly) and ENIL (glu-asn-ile-leu) motifs, and an N-terminal polybasic region which binds importin A and the translation factor scaffold protein eukaryotic initiation factor 4G (eIF4G). This protein is one of the downstream kinases activated by mitogen-activated protein (MAP) kinases. It phosphorylates the eukaryotic initiation factor 4E (eIF4E), thus playing important roles in the initiation of mRNA translation, oncogenic transformation and malignant cell proliferation. In addition to eIF4E, this protein also interacts with von Hippel-Lindau tumor suppressor (VHL), ring-box 1 (Rbx1) and Cullin2 (Cul2), which are all components of the CBC(VHL) ubiquitin ligase E3 complex. Multiple alternatively spliced transcript variants have been found, but the full-length nature and biological activity of only two variants are determined. These two variants encode distinct isoforms which differ in activity and regulation, and in subcellular localization.
COL4A2 1284 ENSG00000134871 collagen type IV alpha 2 This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. The C-terminal portion of the protein, known as canstatin, is an inhibitor of angiogenesis and tumor growth. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter.
GAPDH 2597 ENSG00000111640 glyceraldehyde-3-phosphate dehydrogenase This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants.
TPM3 7170 ENSG00000143549 tropomyosin 3 This gene encodes a member of the tropomyosin family of actin-binding proteins. Tropomyosins are dimers of coiled-coil proteins that provide stability to actin filaments and regulate access of other actin-binding proteins. Mutations in this gene result in autosomal dominant nemaline myopathy and other muscle disorders. This locus is involved in translocations with other loci, including anaplastic lymphoma receptor tyrosine kinase (ALK) and neurotrophic tyrosine kinase receptor type 1 (NTRK1), which result in the formation of fusion proteins that act as oncogenes. There are numerous pseudogenes for this gene on different chromosomes. Alternative splicing results in multiple transcript variants.
MEDAG 84935 ENSG00000102802 mesenteric estrogen dependent adipogenesis NA
FLOT2 2319 ENSG00000132589 flotillin 2 Caveolae are small domains on the inner cell membrane involved in vesicular trafficking and signal transduction. This gene encodes a caveolae-associated, integral membrane protein, which is thought to function in neuronal signaling.
FLNC 2318 ENSG00000128591 filamin C This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene.
SERPINE1 5054 ENSG00000106366 serpin family E member 1 This gene encodes a member of the serine proteinase inhibitor (serpin) superfamily. This member is the principal inhibitor of tissue plasminogen activator (tPA) and urokinase (uPA), and hence is an inhibitor of fibrinolysis. Defects in this gene are the cause of plasminogen activator inhibitor-1 deficiency (PAI-1 deficiency), and high concentrations of the gene product are associated with thrombophilia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene.
MYH7 4625 ENSG00000092054 myosin, heavy chain 7, cardiac muscle, beta Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy.
COL1A1 1277 ENSG00000108821 collagen type I alpha 1 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene.
EHBP1L1 254102 ENSG00000173442 EH domain binding protein 1 like 1 NA
AC019349.5 ENSG00000229732 ENSG00000229732 NA NA
HBA1 3039 ENSG00000206172 hemoglobin subunit alpha 1 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported.
CSTA 1475 ENSG00000121552 cystatin A The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins, and kininogens. This gene encodes a stefin that functions as a cysteine protease inhibitor, forming tight complexes with papain and the cathepsins B, H, and L. The protein is one of the precursor proteins of cornified cell envelope in keratinocytes and plays a role in epidermal development and maintenance. Stefins have been proposed as prognostic and diagnostic tools for cancer.
IL1RN 3557 ENSG00000136689 interleukin 1 receptor antagonist The protein encoded by this gene is a member of the interleukin 1 cytokine family. This protein inhibits the activities of interleukin 1, alpha (IL1A) and interleukin 1, beta (IL1B), and modulates a variety of interleukin 1 related immune and inflammatory responses. This gene and five other closely related cytokine genes form a gene cluster spanning approximately 400 kb on chromosome 2. A polymorphism of this gene is reported to be associated with increased risk of osteoporotic fractures and gastric cancer. Several alternatively spliced transcript variants encoding distinct isoforms have been reported.
TTN 7273 ENSG00000155657 titin This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma.
RHOG 391 ENSG00000177105 ras homolog family member G This gene encodes a member of the Rho family of small GTPases, which cycle between inactive GDP-bound and active GTP-bound states and function as molecular switches in signal transduction cascades. Rho proteins promote reorganization of the actin cytoskeleton and regulate cell shape, attachment, and motility. The encoded protein facilitates translocation of a functional guanine nucleotide exchange factor (GEF) complex from the cytoplasm to the plasma membrane where ras-related C3 botulinum toxin substrate 1 is activated to promote lamellipodium formation and cell migration. Two related pseudogene have been identified on chromosomes 20 and X.
SPRR3 6707 ENSG00000163209 small proline rich protein 3 NA
SPI1 6688 ENSG00000066336 Spi-1 proto-oncogene This gene encodes an ETS-domain transcription factor that activates gene expression during myeloid and B-lymphoid cell development. The nuclear protein binds to a purine-rich sequence known as the PU-box found near the promoters of target genes, and regulates their expression in coordination with other transcription factors and cofactors. The protein can also regulate alternative splicing of target genes. Multiple transcript variants encoding different isoforms have been found for this gene.
NCF4 4689 ENSG00000100365 neutrophil cytosolic factor 4 The protein encoded by this gene is a cytosolic regulatory component of the superoxide-producing phagocyte NADPH-oxidase, a multicomponent enzyme system important for host defense. This protein is preferentially expressed in cells of myeloid lineage. It interacts primarily with neutrophil cytosolic factor 2 (NCF2/p67-phox) to form a complex with neutrophil cytosolic factor 1 (NCF1/p47-phox), which further interacts with the small G protein RAC1 and translocates to the membrane upon cell stimulation. This complex then activates flavocytochrome b, the membrane-integrated catalytic core of the enzyme system. The PX domain of this protein can bind phospholipid products of the PI(3) kinase, which suggests its role in PI(3) kinase-mediated signaling events. The phosphorylation of this protein was found to negatively regulate the enzyme activity. Alternatively spliced transcript variants encoding distinct isoforms have been observed.
CSTB 1476 ENSG00000160213 cystatin B The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and kininogens. This gene encodes a stefin that functions as an intracellular thiol protease inhibitor. The protein is able to form a dimer stabilized by noncovalent forces, inhibiting papain and cathepsins l, h and b. The protein is thought to play a role in protecting against the proteases leaking from lysosomes. Evidence indicates that mutations in this gene are responsible for the primary defects in patients with progressive myoclonic epilepsy (EPM1).
ALDOA 226 ENSG00000149925 aldolase, fructose-bisphosphate A The protein encoded by this gene, Aldolase A (fructose-bisphosphate aldolase), is a glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Three aldolase isozymes (A, B, and C), encoded by three different genes, are differentially expressed during development. Aldolase A is found in the developing embryo and is produced in even greater amounts in adult muscle. Aldolase A expression is repressed in adult liver, kidney and intestine and similar to aldolase C levels in brain and other nervous tissue. Aldolase A deficiency has been associated with myopathy and hemolytic anemia. Alternative splicing and alternative promoter usage results in multiple transcript variants. Related pseudogenes have been identified on chromosomes 3 and 10.
KRT7 3855 ENSG00000135480 keratin 7 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the simple epithelia lining the cavities of the internal organs and in the gland ducts and blood vessels. The genes encoding the type II cytokeratins are clustered in a region of chromosome 12q12-q13. Alternative splicing may result in several transcript variants; however, not all variants have been fully described.
A2M 2 ENSG00000175899 alpha-2-macroglobulin Alpha-2-macroglobulin is a protease inhibitor and cytokine transporter. It inhibits many proteases, including trypsin, thrombin and collagenase. A2M is implicated in Alzheimer disease (AD) due to its ability to mediate the clearance and degradation of A-beta, the major component of beta-amyloid deposits.
HP 3240 ENSG00000257017 haptoglobin This gene encodes a preproprotein, which is processed to yield both alpha and beta chains, which subsequently combine as a tetramer to produce haptoglobin. Haptoglobin functions to bind free plasma hemoglobin, which allows degradative enzymes to gain access to the hemoglobin, while at the same time preventing loss of iron through the kidneys and protecting the kidneys from damage by hemoglobin. Mutations in this gene and/or its regulatory regions cause ahaptoglobinemia or hypohaptoglobinemia. This gene has also been linked to diabetic nephropathy, the incidence of coronary artery disease in type 1 diabetes, Crohn’s disease, inflammatory disease behavior, primary sclerosing cholangitis, susceptibility to idiopathic Parkinson’s disease, and a reduced incidence of Plasmodium falciparum malaria. The protein encoded also exhibits antimicrobial activity against bacteria. A similar duplicated gene is located next to this gene on chromosome 16. Multiple transcript variants encoding different isoforms have been found for this gene.
GP2 2813 ENSG00000169347 glycoprotein 2 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants.
ATG16L2 89849 ENSG00000168010 autophagy related 16 like 2 NA
TALDO1 6888 ENSG00000177156 transaldolase 1 Transaldolase 1 is a key enzyme of the nonoxidative pentose phosphate pathway providing ribose-5-phosphate for nucleic acid synthesis and NADPH for lipid biosynthesis. This pathway can also maintain glutathione at a reduced state and thus protect sulfhydryl groups and cellular integrity from oxygen radicals. The functional gene of transaldolase 1 is located on chromosome 11 and a pseudogene is identified on chromosome 1 but there are conflicting map locations. The second and third exon of this gene were developed by insertion of a retrotransposable element. This gene is thought to be involved in multiple sclerosis.
SELPLG 6404 ENSG00000110876 selectin P ligand This gene encodes a glycoprotein that functions as a high affinity counter-receptor for the cell adhesion molecules P-, E- and L- selectin expressed on myeloid cells and stimulated T lymphocytes. As such, this protein plays a critical role in leukocyte trafficking during inflammation by tethering of leukocytes to activated platelets or endothelia expressing selectins. This protein requires two post-translational modifications, tyrosine sulfation and the addition of the sialyl Lewis x tetrasaccharide (sLex) to its O-linked glycans, for its high-affinity binding activity. Aberrant expression of this gene and polymorphisms in this gene are associated with defects in the innate and adaptive immune response. Alternate splicing results in multiple transcript variants.
PYGM 5837 ENSG00000068976 phosphorylase, glycogen, muscle This gene encodes a muscle enzyme involved in glycogenolysis. Highly similar enzymes encoded by different genes are found in liver and brain. Mutations in this gene are associated with McArdle disease (myophosphorylase deficiency), a glycogen storage disease of muscle. Alternative splicing results in multiple transcript variants.
MXD1 4084 ENSG00000059728 MAX dimerization protein 1 This gene encodes a member of the MYC/MAX/MAD network of basic helix-loop-helix leucine zipper transcription factors. The MYC/MAX/MAD transcription factors mediate cellular proliferation, differentiation and apoptosis. The encoded protein antagonizes MYC-mediated transcriptional activation of target genes by competing for the binding partner MAX and recruiting repressor complexes containing histone deacetylases. Mutations in this gene may play a role in acute leukemia, and the encoded protein is a potential tumor suppressor. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene.
REG1B 5968 ENSG00000172023 regenerating family member 1 beta This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication.
PACSIN3 29763 ENSG00000165912 protein kinase C and casein kinase substrate in neurons 3 This gene is a member of the protein kinase C and casein kinase substrate in neurons family. The encoded protein is involved in linking the actin cytoskeleton with vesicle formation. Alternative splicing results in multiple transcript variants.
GPSM3 63940 ENSG00000213654 G-protein signaling modulator 3 NA
RBM38 55544 ENSG00000132819 RNA binding motif protein 38 NA
ATP1A1 476 ENSG00000163399 ATPase Na+/K+ transporting subunit alpha 1 The protein encoded by this gene belongs to the family of P-type cation transport ATPases, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The catalytic subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes an alpha 1 subunit. Multiple transcript variants encoding different isoforms have been found for this gene.
ATP2A2 488 ENSG00000174437 ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 2 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol into the sarcoplasmic reticulum lumen, and is involved in regulation of the contraction/relaxation cycle. Mutations in this gene cause Darier-White disease, also known as keratosis follicularis, an autosomal dominant skin disorder characterized by loss of adhesion between epidermal cells and abnormal keratinization. Alternative splicing results in multiple transcript variants encoding different isoforms.
TGM2 7052 ENSG00000198959 transglutaminase 2 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene acts as a monomer, is induced by retinoic acid, and appears to be involved in apoptosis. Finally, the encoded protein is the autoantigen implicated in celiac disease. Two transcript variants encoding different isoforms have been found for this gene.
TYROBP 7305 ENSG00000011600 TYRO protein tyrosine kinase binding protein This gene encodes a transmembrane signaling polypeptide which contains an immunoreceptor tyrosine-based activation motif (ITAM) in its cytoplasmic domain. The encoded protein may associate with the killer-cell inhibitory receptor (KIR) family of membrane glycoproteins and may act as an activating signal transduction element. This protein may bind zeta-chain (TCR) associated protein kinase 70kDa (ZAP-70) and spleen tyrosine kinase (SYK) and play a role in signal transduction, bone modeling, brain myelination, and inflammation. Mutations within this gene have been associated with polycystic lipomembranous osteodysplasia with sclerosing leukoencephalopathy (PLOSL), also known as Nasu-Hakola disease. Its putative receptor, triggering receptor expressed on myeloid cells 2 (TREM2), also causes PLOSL. Multiple alternative transcript variants encoding distinct isoforms have been identified for this gene.
RAB10 10890 ENSG00000084733 RAB10, member RAS oncogene family RAB10 belongs to the RAS (see HRAS; MIM 190020) superfamily of small GTPases. RAB proteins localize to exocytic and endocytic compartments and regulate intracellular vesicle trafficking (Bao et al., 1998 [PubMed 9918381]).
SPINK1 6690 ENSG00000164266 serine peptidase inhibitor, Kazal type 1 The protein encoded by this gene is a trypsin inhibitor, which is secreted from pancreatic acinar cells into pancreatic juice. It is thought to function in the prevention of trypsin-catalyzed premature activation of zymogens within the pancreas and the pancreatic duct. Mutations in this gene are associated with hereditary pancreatitis and tropical calcific pancreatitis.
PI3 5266 ENSG00000124102 peptidase inhibitor 3 This gene encodes an elastase-specific inhibitor that functions as an antimicrobial peptide against Gram-positive and Gram-negative bacteria, and fungal pathogens. The protein contains a WAP-type four-disulfide core (WFDC) domain, and is thus a member of the WFDC domain family. Most WFDC gene members are localized to chromosome 20q12-q13 in two clusters: centromeric and telomeric. This gene belongs to the centromeric cluster. Expression of this gene is upgulated by bacterial lipopolysaccharides and cytokines.
CYP17A1 1586 ENSG00000148795 cytochrome P450 family 17 subfamily A member 1 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. It has both 17alpha-hydroxylase and 17,20-lyase activities and is a key enzyme in the steroidogenic pathway that produces progestins, mineralocorticoids, glucocorticoids, androgens, and estrogens. Mutations in this gene are associated with isolated steroid-17 alpha-hydroxylase deficiency, 17-alpha-hydroxylase/17,20-lyase deficiency, pseudohermaphroditism, and adrenal hyperplasia.
HSPB1 3315 ENSG00000106211 heat shock protein family B (small) member 1 The protein encoded by this gene is induced by environmental stress and developmental changes. The encoded protein is involved in stress resistance and actin organization and translocates from the cytoplasm to the nucleus upon stress induction. Defects in this gene are a cause of Charcot-Marie-Tooth disease type 2F (CMT2F) and distal hereditary motor neuropathy (dHMN).
UNC13D 201294 ENSG00000092929 unc-13 homolog D This gene encodes a protein that is a member of the UNC13 family, containing similar domain structure as other family members but lacking an N-terminal phorbol ester-binding C1 domain present in other Munc13 proteins. The protein appears to play a role in vesicle maturation during exocytosis and is involved in regulation of cytolytic granules secretion. Mutations in this gene are associated with familial hemophagocytic lymphohistiocytosis type 3, a genetically heterogeneous, rare autosomal recessive disorder.
MYBPC1 4604 ENSG00000196091 myosin binding protein C, slow type This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene.
MYL3 4634 ENSG00000160808 myosin light chain 3 MYL3 encodes myosin light chain 3, an alkali light chain also referred to in the literature as both the ventricular isoform and the slow skeletal muscle isoform. Mutations in MYL3 have been identified as a cause of mid-left ventricular chamber type hypertrophic cardiomyopathy.
PRSS1 5644 ENSG00000204983 protease, serine 1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7.
HCK 3055 ENSG00000101336 HCK proto-oncogene, Src family tyrosine kinase The protein encoded by this gene is a member of the Src family of tyrosine kinases. This protein is primarily hemopoietic, particularly in cells of the myeloid and B-lymphoid lineages. It may help couple the Fc receptor to the activation of the respiratory burst. In addition, it may play a role in neutrophil migration and in the degranulation of neutrophils. Multiple isoforms with different subcellular distributions are produced due to both alternative splicing and the use of alternative translation initiation codons, including a non-AUG (CUG) codon.
ABTB1 80325 ENSG00000114626 ankyrin repeat and BTB domain containing 1 This gene encodes a protein with an ankyrin repeat region and two BTB/POZ domains, which are thought to be involved in protein-protein interactions. Expression of this gene is activated by the phosphatase and tensin homolog, a tumor suppressor. Alternate splicing results in three transcript variants.
CPA1 1357 ENSG00000091704 carboxypeptidase A1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer.
GPX3 2878 ENSG00000211445 glutathione peroxidase 3 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal.
PTGDS 5730 ENSG00000107317 prostaglandin D2 synthase The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep.
SYNM 23336 ENSG00000182253 synemin The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene.
NEB 4703 ENSG00000183091 nebulin This gene encodes nebulin, a giant protein component of the cytoskeletal matrix that coexists with the thick and thin filaments within the sarcomeres of skeletal muscle. In most vertebrates, nebulin accounts for 3 to 4% of the total myofibrillar protein. The encoded protein contains approximately 30-amino acid long modules that can be classified into 7 types and other repeated modules. Protein isoform sizes vary from 600 to 800 kD due to alternative splicing that is tissue-, species-,and developmental stage-specific. Of the 183 exons in the nebulin gene, at least 43 are alternatively spliced, although exons 143 and 144 are not found in the same transcript. Of the several thousand transcript variants predicted for nebulin, the RefSeq Project has decided to create three representative RefSeq records. Mutations in this gene are associated with recessive nemaline myopathy.
TNNI1 7135 ENSG00000159173 troponin I1, slow skeletal type Troponin proteins associate with tropomyosin and regulate the calcium sensitivity of the myofibril contractile apparatus of striated muscles. Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. The TnI-fast and TnI-slow genes are expressed in fast-twitch and slow-twitch skeletal muscle fibers, respectively, while the TnI-cardiac gene is expressed exclusively in cardiac muscle tissue. This gene encodes the Troponin-I-skeletal-slow-twitch protein. This gene is expressed in cardiac and skeletal muscle during early development but is restricted to slow-twitch skeletal muscle fibers in adults. The encoded protein prevents muscle contraction by inhibiting calcium-mediated conformational changes in actin-myosin complexes.
CRNN 49860 ENSG00000143536 cornulin This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation.
CRIM1 51232 ENSG00000150938 cysteine rich transmembrane BMP regulator 1 (chordin-like) This gene encodes a transmembrane protein containing six cysteine-rich repeat domains and an insulin-like growth factor-binding domain. The encoded protein may play a role in tissue development though interactions with members of the transforming growth factor beta family, such as bone morphogenetic proteins.
CELA3A 10136 ENSG00000142789 chymotrypsin like elastase family member 3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1.
APP 351 ENSG00000142192 amyloid beta precursor protein This gene encodes a cell surface receptor and transmembrane precursor protein that is cleaved by secretases to form a number of peptides. Some of these peptides are secreted and can bind to the acetyltransferase complex APBB1/TIP60 to promote transcriptional activation, while others form the protein basis of the amyloid plaques found in the brains of patients with Alzheimer disease. In addition, two of the peptides are antimicrobial peptides, having been shown to have bacteriocidal and antifungal activities. Mutations in this gene have been implicated in autosomal dominant Alzheimer disease and cerebroarterial amyloidosis (cerebral amyloid angiopathy). Multiple transcript variants encoding several different isoforms have been found for this gene.
FGA 2243 ENSG00000171560 fibrinogen alpha chain This gene encodes the alpha subunit of the coagulation factor fibrinogen, which is a component of the blood clot. Following vascular injury, the encoded preproprotein is proteolytically processed by thrombin during the conversion of fibrinogen to fibrin. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia, afibrinogenemia and renal amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing.
SYNPO2 171024 ENSG00000172403 synaptopodin 2 NA
HADHA 3030 ENSG00000084754 hydroxyacyl-CoA dehydrogenase/3-ketoacyl-CoA thiolase/enoyl-CoA hydratase (trifunctional protein), alpha subunit This gene encodes the alpha subunit of the mitochondrial trifunctional protein, which catalyzes the last three steps of mitochondrial beta-oxidation of long chain fatty acids. The mitochondrial membrane-bound heterocomplex is composed of four alpha and four beta subunits, with the alpha subunit catalyzing the 3-hydroxyacyl-CoA dehydrogenase and enoyl-CoA hydratase activities. Mutations in this gene result in trifunctional protein deficiency or LCHAD deficiency. The genes of the alpha and beta subunits of the mitochondrial trifunctional protein are located adjacent to each other in the human genome in a head-to-head orientation.
CPA2 1358 ENSG00000158516 carboxypeptidase A2 Three different forms of human pancreatic procarboxypeptidase A have been isolated. The encoded protein represents the A2 form, which is a monomeric protein with different biochemical properties from the A1 and A3 forms. The A2 form of pancreatic procarboxypeptidase acts on aromatic C-terminal residues and is a secreted protein.
GFPT1 2673 ENSG00000198380 glutamine–fructose-6-phosphate transaminase 1 This gene encodes the first and rate-limiting enzyme of the hexosamine pathway and controls the flux of glucose into the hexosamine pathway. The product of this gene catalyzes the formation of glucosamine 6-phosphate.
KRT15 3866 ENSG00000171346 keratin 15 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains and are clustered in a region on chromosome 17q21.2.
MB 4151 ENSG00000198125 myoglobin This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported.
FCER1G 2207 ENSG00000158869 Fc fragment of IgE receptor Ig The high affinity IgE receptor is a key molecule involved in allergic reactions. It is a tetramer composed of 1 alpha, 1 beta, and 2 gamma chains. The gamma chains are also subunits of other Fc receptors.
YBX3 8531 ENSG00000060138 Y-box binding protein 3 NA
ARHGAP9 64333 ENSG00000123329 Rho GTPase activating protein 9 This gene encodes a member of the Rho-GAP family of GTPase activating proteins. The protein has substantial GAP activity towards several Rho-family GTPases in vitro, converting them to an inactive GDP-bound state. It is implicated in regulating adhesion of hematopoietic cells to the extracellular matrix. Multiple transcript variants encoding different isoforms have been found for this gene.
MYOZ1 58529 ENSG00000177791 myozenin 1 The protein encoded by this gene is primarily expressed in the skeletal muscle, and belongs to the myozenin family. Members of this family function as calcineurin-interacting proteins that help tether calcineurin to the sarcomere of cardiac and skeletal muscle. They play an important role in modulation of calcineurin signaling.
CSK 1445 ENSG00000103653 c-src tyrosine kinase NA
PTPRG 5793 ENSG00000144724 protein tyrosine phosphatase, receptor type G The protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family. PTPs are known to be signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation. This PTP possesses an extracellular region, a single transmembrane region, and two tandem intracytoplasmic catalytic domains, and thus represents a receptor-type PTP. The extracellular region of this PTP contains a carbonic anhydrase-like (CAH) domain, which is also found in the extracellular region of PTPRBETA/ZETA. This gene is located in a chromosomal region that is frequently deleted in renal cell carcinoma and lung carcinoma, thus is thought to be a candidate tumor suppressor gene.
PRSS3 5646 ENSG00000010438 protease, serine 3 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is expressed in the brain and pancreas and is resistant to common trypsin inhibitors. It is active on peptide linkages involving the carboxyl group of lysine or arginine. This gene is localized to the locus of T cell receptor beta variable orphans on chromosome 9. Four transcript variants encoding different isoforms have been described for this gene.
ACSL1 2180 ENSG00000151726 acyl-CoA synthetase long-chain family member 1 The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. Several transcript variants encoding different isoforms have been found for this gene.
B3GNT8 374907 ENSG00000177191 UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 8 NA
ARHGAP27 201176 ENSG00000159314 Rho GTPase activating protein 27 This gene encodes a member of a large family of proteins that activate Rho-type guanosine triphosphate (GTP) metabolizing enzymes. The encoded protein may pay a role in clathrin-mediated endocytosis. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene.
ADCK3 56997 ENSG00000163050 aarF domain containing kinase 3 This gene encodes a mitochondrial protein similar to yeast ABC1, which functions in an electron-transferring membrane protein complex in the respiratory chain. It is not related to the family of ABC transporter proteins. Expression of this gene is induced by the tumor suppressor p53 and in response to DNA damage, and inhibiting its expression partially suppresses p53-induced apoptosis. Alternatively spliced transcript variants have been found; however, their full-length nature has not been determined.
MYH6 4624 ENSG00000197616 myosin, heavy chain 6, cardiac muscle, alpha Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3.
IL1R2 7850 ENSG00000115590 interleukin 1 receptor type 2 The protein encoded by this gene is a cytokine receptor that belongs to the interleukin 1 receptor family. This protein binds interleukin alpha (IL1A), interleukin beta (IL1B), and interleukin 1 receptor, type I(IL1R1/IL1RA), and acts as a decoy receptor that inhibits the activity of its ligands. Interleukin 4 (IL4) is reported to antagonize the activity of interleukin 1 by inducing the expression and release of this cytokine. This gene and three other genes form a cytokine receptor gene cluster on chromosome 2q12. Alternative splicing results in multiple transcript variants and protein isoforms. Alternative splicing produces both membrane-bound and soluble proteins. A soluble protein is also produced by proteolytic cleavage.
TG 7038 ENSG00000042832 thyroglobulin Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis.
ADIRF 10974 ENSG00000148671 adipogenesis regulatory factor APM2 gene is exclusively expressed in adipose tissue. Its function is currently unknown.
RAB5B 5869 ENSG00000111540 RAB5B, member RAS oncogene family NA
MMP25 64386 ENSG00000008516 matrix metallopeptidase 25 Proteins of the matrix metalloproteinase (MMP) family are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and metastasis. Most MMPs are secreted as inactive proproteins which are activated when cleaved by extracellular proteinases. However, the protein encoded by this gene is a member of the membrane-type MMP (MT-MMP) subfamily, attached to the plasma membrane via a glycosylphosphatidyl inositol anchor. In response to bacterial infection or inflammation, the encoded protein is thought to inactivate alpha-1 proteinase inhibitor, a major tissue protectant against proteolytic enzymes released by activated neutrophils, facilitating the transendothelial migration of neutrophils to inflammatory sites. The encoded protein may also play a role in tumor invasion and metastasis through activation of MMP2. The gene has previously been referred to as MMP20 but has been renamed MMP25.
NLRX1 79671 ENSG00000160703 NLR family member X1 The protein encoded by this gene is a member of the NLR family and localizes to the outer mitochondrial membrane. The encoded protein is a regulator of mitochondrial antivirus responses. Three transcript variants encoding the same protein have been found for this gene.
EPAS1 2034 ENSG00000116016 endothelial PAS domain protein 1 This gene encodes a transcription factor involved in the induction of genes regulated by oxygen, which is induced as oxygen levels fall. The encoded protein contains a basic-helix-loop-helix domain protein dimerization domain as well as a domain found in proteins in signal transduction pathways which respond to oxygen levels. Mutations in this gene are associated with erythrocytosis familial type 4.
HK3 3101 ENSG00000160883 hexokinase 3 Hexokinases phosphorylate glucose to produce glucose-6-phosphate, the first step in most glucose metabolism pathways. This gene encodes hexokinase 3. Similar to hexokinases 1 and 2, this allosteric enzyme is inhibited by its product glucose-6-phosphate.
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",14,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 15 Annotations

out <- mygene::queryMany(gene_list[15,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
name X_id summary symbol query
thyroglobulin 7038 Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. TG ENSG00000042832
nuclear paraspeckle assembly transcript 1 (non-protein coding) 283131 This gene produces a long non-coding RNA (lncRNA) transcribed from the multiple endocrine neoplasia locus. This lncRNA is retained in the nucleus where it forms the core structural component of the paraspeckle sub-organelles. It may act as a transcriptional regulator for numerous genes, including some genes involved in cancer progression. NEAT1 ENSG00000245532
thyroid peroxidase 7173 This gene encodes a membrane-bound glycoprotein. The encoded protein acts as an enzyme and plays a central role in thyroid gland function. The protein functions in the iodination of tyrosine residues in thyroglobulin and phenoxy-ester formation between pairs of iodinated tyrosines to generate the thyroid hormones, thyroxine and triiodothyronine. Mutations in this gene are associated with several disorders of thyroid hormonogenesis, including congenital hypothyroidism, congenital goiter, and thyroid hormone organification defect IIA. Multiple transcript variants encoding distinct isoforms have been identified for this gene, but the full-length nature of some variants has not been determined. TPO ENSG00000115705
paired box 8 7849 This gene encodes a member of the paired box (PAX) family of transcription factors. Members of this gene family typically encode proteins that contain a paired box domain, an octapeptide, and a paired-type homeodomain. This nuclear protein is involved in thyroid follicular cell development and expression of thyroid-specific genes. Mutations in this gene have been associated with thyroid dysgenesis, thyroid follicular carcinomas and atypical follicular thyroid adenomas. Alternatively spliced transcript variants encoding different isoforms have been described. PAX8 ENSG00000125618
surfactant protein B 6439 This gene encodes the pulmonary-associated surfactant protein B (SPB), an amphipathic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. The SPB enhances the rate of spreading and increases the stability of surfactant monolayers in vitro. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 1, also called pulmonary alveolar proteinosis due to surfactant protein B deficiency, and are associated with fatal respiratory distress in the neonatal period. Alternatively spliced transcript variants encoding the same protein have been identified. SFTPB ENSG00000168878
poly(A) binding protein cytoplasmic 1 26986 This gene encodes a poly(A) binding protein. The protein shuttles between the nucleus and cytoplasm and binds to the 3’ poly(A) tail of eukaryotic messenger RNAs via RNA-recognition motifs. The binding of this protein to poly(A) promotes ribosome recruitment and translation initiation; it is also required for poly(A) shortening which is the first step in mRNA decay. The gene is part of a small gene family including three protein-coding genes and several pseudogenes. PABPC1 ENSG00000070756
keratin 5 3852 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the basal layer of the epidermis with family member KRT14. Mutations in these genes have been associated with a complex of diseases termed epidermolysis bullosa simplex. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. KRT5 ENSG00000186081
keratin 7 3855 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the simple epithelia lining the cavities of the internal organs and in the gland ducts and blood vessels. The genes encoding the type II cytokeratins are clustered in a region of chromosome 12q12-q13. Alternative splicing may result in several transcript variants; however, not all variants have been fully described. KRT7 ENSG00000135480
collagen type IV alpha 3 chain 1285 Type IV collagen, the major structural component of basement membranes, is a multimeric protein composed of 3 alpha subunits. These subunits are encoded by 6 different genes, alpha 1 through alpha 6, each of which can form a triple helix structure with 2 other subunits to form type IV collagen. This gene encodes alpha 3. In the Goodpasture syndrome, autoantibodies bind to the collagen molecules in the basement membranes of alveoli and glomeruli. The epitopes that elicit these autoantibodies are localized largely to the non-collagenous C-terminal domain of the protein. A specific kinase phosphorylates amino acids in this same C-terminal region and the expression of this kinase is upregulated during pathogenesis. This gene is also linked to an autosomal recessive form of Alport syndrome. The mutations contributing to this syndrome are also located within the exons that encode this C-terminal region. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. COL4A3 ENSG00000169031
alpha-2-macroglobulin 2 Alpha-2-macroglobulin is a protease inhibitor and cytokine transporter. It inhibits many proteases, including trypsin, thrombin and collagenase. A2M is implicated in Alzheimer disease (AD) due to its ability to mediate the clearance and degradation of A-beta, the major component of beta-amyloid deposits. A2M ENSG00000175899
titin 7273 This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. TTN ENSG00000155657
ZFP36 ring finger protein 7538 NA ZFP36 ENSG00000128016
creatine kinase, M-type 1158 The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. CKM ENSG00000104879
nephronectin 255743 NA NPNT ENSG00000168743
collagen type IV alpha 4 chain 1286 This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. This particular collagen IV subunit, however, is only found in a subset of basement membranes. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. Mutations in this gene are associated with type II autosomal recessive Alport syndrome (hereditary glomerulonephropathy) and with familial benign hematuria (thin basement membrane disease). Two transcripts, differing only in their transcription start sites, have been identified for this gene and, as is common for collagen genes, multiple polyadenylation sites are found in the 3’ UTR. COL4A4 ENSG00000081052
lipase G, endothelial type 9388 The protein encoded by this gene has substantial phospholipase activity and may be involved in lipoprotein metabolism and vascular biology. This protein is designated a member of the TG lipase family by its sequence and characteristic lid region which provides substrate specificity for enzymes of the TG lipase family. LIPG ENSG00000101670
myosin, heavy chain 6, cardiac muscle, alpha 4624 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. MYH6 ENSG00000197616
inositol polyphosphate-5-phosphatase J 27124 NA INPP5J ENSG00000185133
eukaryotic translation elongation factor 1 alpha 1 1915 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas, and the other isoform (alpha 2) is expressed in brain, heart and skeletal muscle. This isoform is identified as an autoantigen in 66% of patients with Felty syndrome. This gene has been found to have multiple copies on many chromosomes, some of which, if not all, represent different pseudogenes. EEF1A1 ENSG00000156508
actin, beta 60 This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. ACTB ENSG00000075624
small nucleolar RNA host gene 14 ENSG00000224078 NA SNHG14 ENSG00000224078
carboxypeptidase B1 1360 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. CPB1 ENSG00000153002
serpin family E member 1 5054 This gene encodes a member of the serine proteinase inhibitor (serpin) superfamily. This member is the principal inhibitor of tissue plasminogen activator (tPA) and urokinase (uPA), and hence is an inhibitor of fibrinolysis. Defects in this gene are the cause of plasminogen activator inhibitor-1 deficiency (PAI-1 deficiency), and high concentrations of the gene product are associated with thrombophilia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. SERPINE1 ENSG00000106366
phosphatidylethanolamine binding protein 4 157310 The phosphatidylethanolamine (PE)-binding proteins, including PEBP4, are an evolutionarily conserved family of proteins with pivotal biologic functions, such as lipid binding and inhibition of serine proteases (Wang et al., 2004 [PubMed 15302887]). PEBP4 ENSG00000134020
cardiomyopathy associated 5 202333 NA CMYA5 ENSG00000164309
latent transforming growth factor beta binding protein 2 4053 The protein encoded by this gene belongs to the family of latent transforming growth factor (TGF)-beta binding proteins (LTBP), which are extracellular matrix proteins with multi-domain structure. This protein is the largest member of the LTBP family possessing unique regions and with most similarity to the fibrillins. It has thus been suggested that it may have multiple functions: as a member of the TGF-beta latent complex, as a structural component of microfibrils, and a role in cell adhesion. LTBP2 ENSG00000119681
myosin light chain 6 4637 Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain that is expressed in smooth muscle and non-muscle tissues. Genomic sequences representing several pseudogenes have been described and two transcript variants encoding different isoforms have been identified for this gene. MYL6 ENSG00000092841
ral guanine nucleotide dissociation stimulator like 3 57139 NA RGL3 ENSG00000205517
complement factor D 1675 This gene encodes a member of the S1, or chymotrypsin, family of serine peptidases. This protease catalyzes the cleavage of factor B, the rate-limiting step of the alternative pathway of complement activation. This protein also functions as an adipokine, a cell signaling protein secreted by adipocytes, which regulates insulin secretion in mice. Mutations in this gene underlie complement factor D deficiency, which is associated with recurrent bacterial meningitis infections in human patients. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate the mature protease. CFD ENSG00000197766
dedicator of cytokinesis 5 80005 NA DOCK5 ENSG00000147459
collagen type I alpha 2 chain 1278 This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. COL1A2 ENSG00000164692
erythropoietin receptor 2057 This gene encodes the erythropoietin receptor which is a member of the cytokine receptor family. Upon erythropoietin binding, this receptor activates Jak2 tyrosine kinase which activates different intracellular pathways including: Ras/MAP kinase, phosphatidylinositol 3-kinase and STAT transcription factors. The stimulated erythropoietin receptor appears to have a role in erythroid cell survival. Defects in the erythropoietin receptor may produce erythroleukemia and familial erythrocytosis. Dysregulation of this gene may affect the growth of certain tumors. Alternate splicing results in multiple transcript variants. EPOR ENSG00000187266
crystallin alpha B 1410 Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. CRYAB ENSG00000109846
collagen type XXIII alpha 1 chain 91522 COL23A1 is a member of the transmembrane collagens, a subfamily of the nonfibrillar collagens that contain a single pass hydrophobic transmembrane domain (Banyard et al., 2003 [PubMed 12644459]). COL23A1 ENSG00000050767
keratin 6A 3853 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. As many as six of this type II cytokeratin (KRT6) have been identified; the multiplicity of the genes is attributed to successive gene duplication events. The genes are expressed with family members KRT16 and/or KRT17 in the filiform papillae of the tongue, the stratified epithelial lining of oral mucosa and esophagus, the outer root sheath of hair follicles, and the glandular epithelia. This KRT6 gene in particular encodes the most abundant isoform. Mutations in these genes have been associated with pachyonychia congenita. In addition, peptides from the C-terminal region of the protein have antimicrobial activity against bacterial pathogens. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. KRT6A ENSG00000205420
chromosome 1 open reading frame 198 84886 NA C1orf198 ENSG00000119280
advanced glycosylation end product-specific receptor 177 The advanced glycosylation end product (AGE) receptor encoded by this gene is a member of the immunoglobulin superfamily of cell surface receptors. It is a multiligand receptor, and besides AGE, interacts with other molecules implicated in homeostasis, development, and inflammation, and certain diseases, such as diabetes and Alzheimer’s disease. Many alternatively spliced transcript variants encoding different isoforms, as well as non-protein-coding variants, have been described for this gene (PMID:18089847). AGER ENSG00000204305
glycoprotein 2 2813 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. GP2 ENSG00000169347
complement component 7 730 C7 is a component of the complement system. It participates in the formation of Membrane Attack Complex (MAC). People with C7 deficiency are prone to bacterial infection. C7 ENSG00000112936
ring finger protein 144B 255488 NA RNF144B ENSG00000137393
carboxypeptidase A1 1357 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. CPA1 ENSG00000091704
metallothionein 2A 4502 NA MT2A ENSG00000125148
glycerol-3-phosphate dehydrogenase 1 2819 This gene encodes a member of the NAD-dependent glycerol-3-phosphate dehydrogenase family. The encoded protein plays a critical role in carbohydrate and lipid metabolism by catalyzing the reversible conversion of dihydroxyacetone phosphate (DHAP) and reduced nicotine adenine dinucleotide (NADH) to glycerol-3-phosphate (G3P) and NAD+. The encoded cytosolic protein and mitochondrial glycerol-3-phosphate dehydrogenase also form a glycerol phosphate shuttle that facilitates the transfer of reducing equivalents from the cytosol to mitochondria. Mutations in this gene are a cause of transient infantile hypertriglyceridemia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. GPD1 ENSG00000167588
phosphoenolpyruvate carboxykinase 1 5105 This gene is a main control point for the regulation of gluconeogenesis. The cytosolic enzyme encoded by this gene, along with GTP, catalyzes the formation of phosphoenolpyruvate from oxaloacetate, with the release of carbon dioxide and GDP. The expression of this gene can be regulated by insulin, glucocorticoids, glucagon, cAMP, and diet. Defects in this gene are a cause of cytosolic phosphoenolpyruvate carboxykinase deficiency. A mitochondrial isozyme of the encoded protein also has been characterized. PCK1 ENSG00000124253
LIM domain 7 4008 This gene encodes a protein containing a calponin homology (CH) domain, a PDZ domain, and a LIM domain, and may be involved in protein-protein interactions. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, however, the full-length nature of some variants is not known. LMO7 ENSG00000136153
integrin subunit alpha 3 3675 The gene encodes a member of the integrin alpha chain family of proteins. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain that function as cell surface adhesion molecules. The encoded preproprotein is proteolytically processed to generate light and heavy chains that comprise the alpha 3 subunit. This subunit joins with a beta 1 subunit to form an integrin that interacts with extracellular matrix proteins including members of the laminin family. Expression of this gene may be correlated with breast cancer metastasis. ITGA3 ENSG00000005884
H19, imprinted maternally expressed transcript (non-protein coding) 283120 This gene is located in an imprinted region of chromosome 11 near the insulin-like growth factor 2 (IGF2) gene. This gene is only expressed from the maternally-inherited chromosome, whereas IGF2 is only expressed from the paternally-inherited chromosome. The product of this gene is a long non-coding RNA which functions as a tumor suppressor. Mutations in this gene have been associated with Beckwith-Wiedemann Syndrome and Wilms tumorigenesis. Alternative splicing results in multiple transcript variants. H19 ENSG00000130600
ribosomal protein L3 6122 Ribosomes, the complexes that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L3P family of ribosomal proteins and it is located in the cytoplasm. The protein can bind to the HIV-1 TAR mRNA, and it has been suggested that the protein contributes to tat-mediated transactivation. This gene is co-transcribed with several small nucleolar RNA genes, which are located in several of this gene’s introns. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. RPL3 ENSG00000100316
CD248 molecule 57124 NA CD248 ENSG00000174807
retinol binding protein 4 5950 This protein belongs to the lipocalin family and is the specific carrier for retinol (vitamin A alcohol) in the blood. It delivers retinol from the liver stores to the peripheral tissues. In plasma, the RBP-retinol complex interacts with transthyretin which prevents its loss by filtration through the kidney glomeruli. A deficiency of vitamin A blocks secretion of the binding protein posttranslationally and results in defective delivery and supply to the epidermal cells. RBP4 ENSG00000138207
transglutaminase 3 7053 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene consists of two polypeptide chains activated from a single precursor protein by proteolysis. The encoded protein is involved the later stages of cell envelope formation in the epidermis and hair follicle. TGM3 ENSG00000125780
actinin alpha 2 88 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. ACTN2 ENSG00000077522
endothelial PAS domain protein 1 2034 This gene encodes a transcription factor involved in the induction of genes regulated by oxygen, which is induced as oxygen levels fall. The encoded protein contains a basic-helix-loop-helix domain protein dimerization domain as well as a domain found in proteins in signal transduction pathways which respond to oxygen levels. Mutations in this gene are associated with erythrocytosis familial type 4. EPAS1 ENSG00000116016
elastin 2006 This gene encodes a protein that is one of the two components of elastic fibers. The encoded protein is rich in hydrophobic amino acids such as glycine and proline, which form mobile hydrophobic regions bounded by crosslinks between lysine residues. Deletions and mutations in this gene are associated with supravalvular aortic stenosis (SVAS) and autosomal dominant cutis laxa. Multiple transcript variants encoding different isoforms have been found for this gene. ELN ENSG00000049540
solute carrier family 4 member 11 83959 This gene encodes a voltage-regulated, electrogenic sodium-coupled borate cotransporter that is essential for borate homeostasis, cell growth and cell proliferation. Mutations in this gene have been associated with a number of endothelial corneal dystrophies including recessive corneal endothelial dystrophy 2, corneal dystrophy and perceptive deafness, and Fuchs endothelial corneal dystrophy. Multiple transcript variants encoding different isoforms have been described. SLC4A11 ENSG00000088836
ST3 beta-galactoside alpha-2,3-sialyltransferase 1 6482 The protein encoded by this gene is a type II membrane protein that catalyzes the transfer of sialic acid from CMP-sialic acid to galactose-containing substrates. The encoded protein is normally found in the Golgi but can be proteolytically processed to a soluble form. Correct glycosylation of the encoded protein may be critical to its sialyltransferase activity. This protein, which is a member of glycosyltransferase family 29, can use the same acceptor substrates as does sialyltransferase 4B. Two transcript variants encoding the same protein have been found for this gene. Other transcript variants may exist, but have not been fully characterized yet. ST3GAL1 ENSG00000008513
tropomyosin 1 (alpha) 7168 This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. TPM1 ENSG00000140416
actin, alpha 2, smooth muscle, aorta 59 The protein encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Defects in this gene cause aortic aneurysm familial thoracic type 6. Multiple alternatively spliced variants, encoding the same protein, have been identified. ACTA2 ENSG00000107796
coronin 6 84940 NA CORO6 ENSG00000167549
keratin 13 3860 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. KRT13 ENSG00000171401
rhophilin, Rho GTPase binding protein 1 114822 NA RHPN1 ENSG00000158106
protease, serine 1 5644 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. PRSS1 ENSG00000204983
regenerating family member 1 alpha 5967 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. REG1A ENSG00000115386
dual oxidase 1 53905 The protein encoded by this gene is a glycoprotein and a member of the NADPH oxidase family. The synthesis of thyroid hormone is catalyzed by a protein complex located at the apical membrane of thyroid follicular cells. This complex contains an iodide transporter, thyroperoxidase, and a peroxide generating system that includes proteins encoded by this gene and the similar DUOX2 gene. This protein is known as dual oxidase because it has both a peroxidase homology domain and a gp91phox domain. This protein generates hydrogen peroxide and thereby plays a role in the activity of thyroid peroxidase, lactoperoxidase, and in lactoperoxidase-mediated antimicrobial defense at mucosal surfaces. Two alternatively spliced transcript variants encoding the same protein have been described for this gene. DUOX1 ENSG00000137857
actin, alpha, cardiac muscle 1 70 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). ACTC1 ENSG00000159251
albumin 213 Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. ALB ENSG00000163631
S100 calcium binding protein B 6285 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21; however, this gene is located at 21q22.3. This protein may function in Neurite extension, proliferation of melanoma cells, stimulation of Ca2+ fluxes, inhibition of PKC-mediated phosphorylation, astrocytosis and axonal proliferation, and inhibition of microtubule assembly. Chromosomal rearrangements and altered expression of this gene have been implicated in several neurological, neoplastic, and other types of diseases, including Alzheimer’s disease, Down’s syndrome, epilepsy, amyotrophic lateral sclerosis, melanoma, and type I diabetes. S100B ENSG00000160307
ribosomal protein S8 6202 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S8E family of ribosomal proteins. It is located in the cytoplasm. Increased expression of this gene in colorectal tumors and colon polyps compared to matched normal colonic mucosa has been observed. This gene is co-transcribed with the small nucleolar RNA genes U38A, U38B, U39, and U40, which are located in its fourth, fifth, first, and second introns, respectively. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. RPS8 ENSG00000142937
pleckstrin homology, MyTH4 and FERM domain containing H1 57475 NA PLEKHH1 ENSG00000054690
chymotrypsin like elastase family member 3A 10136 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. CELA3A ENSG00000142789
glutathione peroxidase 3 2878 This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. GPX3 ENSG00000211445
ribosomal protein S12 6206 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S12E family of ribosomal proteins. It is located in the cytoplasm. Increased expression of this gene in colorectal cancers compared to matched normal colonic mucosa has been observed. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. RPS12 ENSG00000112306
ACTA2 antisense RNA 1 ENSG00000180139 NA ACTA2-AS1 ENSG00000180139
vascular endothelial growth factor A 7422 This gene is a member of the PDGF/VEGF growth factor family. It encodes a heparin-binding protein, which exists as a disulfide-linked homodimer. This growth factor induces proliferation and migration of vascular endothelial cells, and is essential for both physiological and pathological angiogenesis. Disruption of this gene in mice resulted in abnormal embryonic blood vessel formation. This gene is upregulated in many known tumors and its expression is correlated with tumor stage and progression. Elevated levels of this protein are found in patients with POEMS syndrome, also known as Crow-Fukase syndrome. Allelic variants of this gene have been associated with microvascular complications of diabetes 1 (MVCD1) and atherosclerosis. Alternatively spliced transcript variants encoding different isoforms have been described. There is also evidence for alternative translation initiation from upstream non-AUG (CUG) codons resulting in additional isoforms. A recent study showed that a C-terminally extended isoform is produced by use of an alternative in-frame translation termination codon via a stop codon readthrough mechanism, and that this isoform is antiangiogenic. Expression of some isoforms derived from the AUG start codon is regulated by a small upstream open reading frame, which is located within an internal ribosome entry site. VEGFA ENSG00000112715
basal cell adhesion molecule (Lutheran blood group) 4059 This gene encodes Lutheran blood group glycoprotein, a member of the immunoglobulin superfamily and a receptor for the extracellular matrix protein, laminin. The protein contains five extracellular immunoglobulin domains, a single transmembrane domain, and a short C-terminal cytoplasmic tail. This protein may play a role in epithelial cell cancer and in vaso-occlusion of red blood cells in sickle cell disease. Polymorphisms in this gene define some of the antigens in the Lutheran system and also the Auberger system. Inactivating variants of this gene result in the recessive Lutheran null phenotype, Lu(a-b-), of the Lutheran blood group. Two transcript variants encoding different isoforms have been found for this gene. BCAM ENSG00000187244
desmin 1674 This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. DES ENSG00000175084
nebulin related anchoring protein 4892 NA NRAP ENSG00000197893
LY6/PLAUR domain containing 3 27076 NA LYPD3 ENSG00000124466
CD59 molecule 966 This gene encodes a cell surface glycoprotein that regulates complement-mediated cell lysis, and it is involved in lymphocyte signal transduction. This protein is a potent inhibitor of the complement membrane attack complex, whereby it binds complement C8 and/or C9 during the assembly of this complex, thereby inhibiting the incorporation of multiple copies of C9 into the complex, which is necessary for osmolytic pore formation. This protein also plays a role in signal transduction pathways in the activation of T cells. Mutations in this gene cause CD59 deficiency, a disease resulting in hemolytic anemia and thrombosis, and which causes cerebral infarction. Multiple alternatively spliced transcript variants, which encode the same protein, have been identified for this gene. CD59 ENSG00000085063
serum amyloid A1 6288 This gene encodes a member of the serum amyloid A family of apolipoproteins. The encoded preproprotein is proteolytically processed to generate the mature protein. This protein is a major acute phase protein that is highly expressed in response to inflammation and tissue injury. This protein also plays an important role in HDL metabolism and cholesterol homeostasis. High levels of this protein are associated with chronic inflammatory diseases including atherosclerosis, rheumatoid arthritis, Alzheimer’s disease and Crohn’s disease. This protein may also be a potential biomarker for certain tumors. Alternate splicing results in multiple transcript variants that encode the same protein. A pseudogene of this gene is found on chromosome 11. SAA1 ENSG00000173432
myosin phosphatase Rho interacting protein 23164 NA MPRIP ENSG00000133030
inhibitor of DNA binding 3, HLH protein 3399 The protein encoded by this gene is a helix-loop-helix (HLH) protein that can form heterodimers with other HLH proteins. However, the encoded protein lacks a basic DNA-binding domain and therefore inhibits the DNA binding of any HLH protein with which it interacts. ID3 ENSG00000117318
ubiquitin specific peptidase 54 159195 NA USP54 ENSG00000166348
transcription factor CP2-like 1 29842 NA TFCP2L1 ENSG00000115112
CAP-Gly domain containing linker protein 3 25999 This gene encodes a member of the cytoplasmic linker protein 170 family. Members of this protein family contain a cytoskeleton-associated protein glycine-rich domain and mediate the interaction of microtubules with cellular organelles. The encoded protein plays a role in T cell apoptosis by facilitating the association of tubulin and the lipid raft ganglioside GD3. The encoded protein also functions as a scaffold protein mediating membrane localization of phosphorylated protein kinase B. Alternatively spliced transcript variants have been observed for this gene. CLIP3 ENSG00000105270
ribosomal protein lateral stalk subunit P2 6181 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal phosphoprotein that is a component of the 60S subunit. The protein, which is a functional equivalent of the E. coli L7/L12 ribosomal protein, belongs to the L12P family of ribosomal proteins. It plays an important role in the elongation step of protein synthesis. Unlike most ribosomal proteins, which are basic, the encoded protein is acidic. Its C-terminal end is nearly identical to the C-terminal ends of the ribosomal phosphoproteins P0 and P1. The P2 protein can interact with P0 and P1 to form a pentameric complex consisting of P1 and P2 dimers, and a P0 monomer. The protein is located in the cytoplasm. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. RPLP2 ENSG00000177600
TEA domain transcription factor 4 7004 This gene product is a member of the transcriptional enhancer factor (TEF) family of transcription factors, which contain the TEA/ATTS DNA-binding domain. It is preferentially expressed in the skeletal muscle, and binds to the M-CAT regulatory element found in promoters of muscle-specific genes to direct their gene expression. Alternatively spliced transcripts encoding distinct isoforms, some of which are translated through the use of a non-AUG (UUG) initiation codon, have been described for this gene. TEAD4 ENSG00000197905
amine oxidase, copper containing 3 8639 This gene encodes a member of the semicarbazide-sensitive amine oxidase family. Copper amine oxidases catalyze the oxidative conversion of amines to aldehydes in the presence of copper and quinone cofactor. The encoded protein is localized to the cell surface, has adhesive properties as well as monoamine oxidase activity, and may be involved in leukocyte trafficking. Alterations in levels of the encoded protein may be associated with many diseases, including diabetes mellitus. A pseudogene of this gene has been described and is located approximately 9-kb downstream on the same chromosome. Alternative splicing results in multiple transcript variants. AOC3 ENSG00000131471
calreticulin 811 Calreticulin is a multifunctional protein that acts as a major Ca(2+)-binding (storage) protein in the lumen of the endoplasmic reticulum. It is also found in the nucleus, suggesting that it may have a role in transcription regulation. Calreticulin binds to the synthetic peptide KLGFFKR, which is almost identical to an amino acid sequence in the DNA-binding domain of the superfamily of nuclear receptors. Calreticulin binds to antibodies in certain sera of systemic lupus and Sjogren patients which contain anti-Ro/SSA antibodies, it is highly conserved among species, and it is located in the endoplasmic and sarcoplasmic reticulum where it may bind calcium. The amino terminus of calreticulin interacts with the DNA-binding domain of the glucocorticoid receptor and prevents the receptor from binding to its specific glucocorticoid response element. Calreticulin can inhibit the binding of androgen receptor to its hormone-responsive DNA element and can inhibit androgen receptor and retinoic acid receptor transcriptional activities in vivo, as well as retinoic acid-induced neuronal differentiation. Thus, calreticulin can act as an important modulator of the regulation of gene transcription by nuclear hormone receptors. Systemic lupus erythematosus is associated with increased autoantibody titers against calreticulin but calreticulin is not a Ro/SS-A antigen. Earlier papers referred to calreticulin as an Ro/SS-A antigen but this was later disproven. Increased autoantibody titer against human calreticulin is found in infants with complete congenital heart block of both the IgG and IgM classes. CALR ENSG00000179218
death associated protein kinase 2 23604 This gene encodes a protein that belongs to the serine/threonine protein kinase family. This protein contains a N-terminal protein kinase domain followed by a conserved calmodulin-binding domain with significant similarity to that of death-associated protein kinase 1 (DAPK1), a positive regulator of programmed cell death. Overexpression of this gene was shown to induce cell apoptosis. It uses multiple polyadenylation sites. DAPK2 ENSG00000035664
SKI-like proto-oncogene 6498 The protein encoded by this gene is a component of the SMAD pathway, which regulates cell growth and differentiation through transforming growth factor-beta (TGFB). In the absence of ligand, the encoded protein binds to the promoter region of TGFB-responsive genes and recruits a nuclear repressor complex. TGFB signaling causes SMAD3 to enter the nucleus and degrade this protein, allowing these genes to be activated. Four transcript variants encoding three different isoforms have been found for this gene. SKIL ENSG00000136603
myosin binding protein C, slow type 4604 This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. MYBPC1 ENSG00000196091
keratin 4 3851 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. KRT4 ENSG00000170477
immunoglobulin heavy constant gamma 1 (G1m marker) ENSG00000211896 NA IGHG1 ENSG00000211896
frizzled class receptor 1 8321 Members of the ‘frizzled’ gene family encode 7-transmembrane domain proteins that are receptors for Wnt signaling proteins. The FZD1 protein contains a signal peptide, a cysteine-rich domain in the N-terminal extracellular region, 7 transmembrane domains, and a C-terminal PDZ domain-binding motif. The FZD1 transcript is expressed in various tissues. FZD1 ENSG00000157240
epoxide hydrolase 1 2052 Epoxide hydrolase is a critical biotransformation enzyme that converts epoxides from the degradation of aromatic compounds to trans-dihydrodiols which can be conjugated and excreted from the body. Epoxide hydrolase functions in both the activation and detoxification of epoxides. Mutations in this gene cause preeclampsia, epoxide hydrolase deficiency or increased epoxide hydrolase activity. Alternatively spliced transcript variants encoding the same protein have been found for this gene. EPHX1 ENSG00000143819
activated leukocyte cell adhesion molecule 214 This gene encodes activated leukocyte cell adhesion molecule (ALCAM), also known as CD166 (cluster of differentiation 166), which is a member of a subfamily of immunoglobulin receptors with five immunoglobulin-like domains (VVC2C2C2) in the extracellular domain. This protein binds to T-cell differentiation antigene CD6, and is implicated in the processes of cell adhesion and migration. Multiple alternatively spliced transcript variants encoding different isoforms have been found. ALCAM ENSG00000170017
solute carrier family 25 member 29 123096 This gene encodes a nuclear-encoded mitochondrial protein that is a member of the large family of solute carrier family 25 (SLC25) mitochondrial transporters. The members of this superfamily are involved in numerous metabolic pathways and cell functions. This gene product was previously reported to be a mitochondrial carnitine-acylcarnitine-like (CACL) translocase (PMID:128829710) or an ornithine transporter (designated ORNT3, PMID:19287344), however, a recent study characterized the main role of this protein as a mitochondrial transporter of basic amino acids, with a preference for arginine and lysine (PMID:24652292). Alternatively spliced transcript variants have been found for this gene. SLC25A29 ENSG00000197119
transglutaminase 2 7052 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene acts as a monomer, is induced by retinoic acid, and appears to be involved in apoptosis. Finally, the encoded protein is the autoantigen implicated in celiac disease. Two transcript variants encoding different isoforms have been found for this gene. TGM2 ENSG00000198959
cysteine rich angiogenic inducer 61 3491 The secreted protein encoded by this gene is growth factor-inducible and promotes the adhesion of endothelial cells. The encoded protein interacts with several integrins and with heparan sulfate proteoglycan. This protein also plays a role in cell proliferation, differentiation, angiogenesis, apoptosis, and extracellular matrix formation. CYR61 ENSG00000142871
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",15,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 16 Annotations

out <- mygene::queryMany(gene_list[16,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol query summary name X_id notfound
MYH7 ENSG00000092054 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. myosin, heavy chain 7, cardiac muscle, beta 4625 NA
IGHA1 ENSG00000211895 NA immunoglobulin heavy constant alpha 1 ENSG00000211895 NA
MB ENSG00000198125 This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. myoglobin 4151 NA
PIGR ENSG00000162896 This gene is a member of the immunoglobulin superfamily. The encoded poly-Ig receptor binds polymeric immunoglobulin molecules at the basolateral surface of epithelial cells; the complex is then transported across the cell to be secreted at the apical surface. A significant association was found between immunoglobulin A nephropathy and several SNPs in this gene. polymeric immunoglobulin receptor 5284 NA
IGHA2 ENSG00000211890 NA immunoglobulin heavy constant alpha 2 (A2m marker) ENSG00000211890 NA
SFTPB ENSG00000168878 This gene encodes the pulmonary-associated surfactant protein B (SPB), an amphipathic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. The SPB enhances the rate of spreading and increases the stability of surfactant monolayers in vitro. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 1, also called pulmonary alveolar proteinosis due to surfactant protein B deficiency, and are associated with fatal respiratory distress in the neonatal period. Alternatively spliced transcript variants encoding the same protein have been identified. surfactant protein B 6439 NA
KRT19 ENSG00000171345 The protein encoded by this gene is a member of the keratin family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. The type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. Unlike its related family members, this smallest known acidic cytokeratin is not paired with a basic cytokeratin in epithelial cells. It is specifically expressed in the periderm, the transiently superficial layer that envelopes the developing epidermis. The type I cytokeratins are clustered in a region of chromosome 17q12-q21. keratin 19 3880 NA
SFTPA2 ENSG00000185303 This gene is one of several genes encoding pulmonary-surfactant associated proteins (SFTPA) located on chromosome 10. Mutations in this gene and a highly similar gene located nearby, which affect the highly conserved carbohydrate recognition domain, are associated with idiopathic pulmonary fibrosis. The current version of the assembly displays only a single centromeric SFTPA gene pair rather than the two gene pairs shown in the previous assembly which were thought to have resulted from a duplication. surfactant protein A2 729238 NA
C1QB ENSG00000173369 This gene encodes a major constituent of the human complement subcomponent C1q. C1q associates with C1r and C1s in order to yield the first component of the serum complement system. Deficiency of C1q has been associated with lupus erythematosus and glomerulonephritis. C1q is composed of 18 polypeptide chains: six A-chains, six B-chains, and six C-chains. Each chain contains a collagen-like region located near the N terminus and a C-terminal globular region. The A-, B-, and C-chains are arranged in the order A-C-B on chromosome 1. This gene encodes the B-chain polypeptide of human complement subcomponent C1q complement component 1, q subcomponent, B chain 713 NA
RNASE1 ENSG00000129538 This gene encodes a member of the pancreatic-type of secretory ribonucleases, a subset of the ribonuclease A superfamily. The encoded endonuclease cleaves internal phosphodiester RNA bonds on the 3’-side of pyrimidine bases. It prefers poly(C) as a substrate and hydrolyzes 2’,3’-cyclic nucleotides, with a pH optimum near 8.0. The encoded protein is monomeric and more commonly acts to degrade ds-RNA over ss-RNA. Alternative splicing occurs at this locus and four transcript variants encoding the same protein have been identified. ribonuclease A family member 1, pancreatic 6035 NA
MYL2 ENSG00000111245 Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. myosin light chain 2 4633 NA
TCAP ENSG00000173991 Sarcomere assembly is regulated by the muscle protein titin. Titin is a giant elastic protein with kinase activity that extends half the length of a sarcomere. It serves as a scaffold to which myofibrils and other muscle related proteins are attached. This gene encodes a protein found in striated and cardiac muscle that binds to the titin Z1-Z2 domains and is a substrate of titin kinase, interactions thought to be critical to sarcomere assembly. Mutations in this gene are associated with limb-girdle muscular dystrophy type 2G. titin-cap 8557 NA
C1QC ENSG00000159189 This gene encodes a major constituent of the human complement subcomponent C1q. C1q associates with C1r and C1s in order to yield the first component of the serum complement system. A deficiency in C1q has been associated with lupus erythematosus and glomerulonephritis. C1q is composed of 18 polypeptide chains: six A-chains, six B-chains, and six C-chains. Each chain contains a collagen-like region located near the N-terminus, and a C-terminal globular region. The A-, B-, and C-chains are arranged in the order A-C-B on chromosome 1. This gene encodes the C-chain polypeptide of human complement subcomponent C1q. Alternatively spliced transcript variants that encode the same protein have been found for this gene. complement component 1, q subcomponent, C chain 714 NA
SFTPA1 ENSG00000122852 This gene encodes a lung surfactant protein that is a member of a subfamily of C-type lectins called collectins. The encoded protein binds specific carbohydrate moieties found on lipids and on the surface of microorganisms. This protein plays an essential role in surfactant homeostasis and in the defense against respiratory pathogens. Mutations in this gene are associated with idiopathic pulmonary fibrosis. Alternate splicing results in multiple transcript variants. surfactant protein A1 653509 NA
SCD ENSG00000099194 This gene encodes an enzyme involved in fatty acid biosynthesis, primarily the synthesis of oleic acid. The protein belongs to the fatty acid desaturase family and is an integral membrane protein located in the endoplasmic reticulum. Transcripts of approximately 3.9 and 5.2 kb, differing only by alternative polyadenlyation signals, have been detected. A gene encoding a similar enzyme is located on chromosome 4 and a pseudogene of this gene is located on chromosome 17. stearoyl-CoA desaturase 6319 NA
SFTPC ENSG00000168484 This gene encodes the pulmonary-associated surfactant protein C (SPC), an extremely hydrophobic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 2, also called pulmonary alveolar proteinosis due to surfactant protein C deficiency, and are associated with interstitial lung disease in older infants, children, and adults. Alternatively spliced transcript variants encoding different protein isoforms have been identified. surfactant protein C 6440 NA
RPL13 ENSG00000167526 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L13E family of ribosomal proteins. It is located in the cytoplasm. This gene is expressed at significantly higher levels in benign breast lesions than in breast carcinomas. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. ribosomal protein L13 6137 NA
KRT10 ENSG00000186395 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. keratin 10 3858 NA
LGALS4 ENSG00000171747 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. The expression of this gene is restricted to small intestine, colon, and rectum, and it is underexpressed in colorectal cancer. galectin 4 3960 NA
CA2 ENSG00000104267 The protein encoded by this gene is one of several isozymes of carbonic anhydrase, which catalyzes reversible hydration of carbon dioxide. Defects in this enzyme are associated with osteopetrosis and renal tubular acidosis. Two transcript variants encoding different isoforms have been found for this gene. carbonic anhydrase 2 760 NA
PTPRF ENSG00000142949 The protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family. PTPs are known to be signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation. This PTP possesses an extracellular region, a single transmembrane region, and two tandem intracytoplasmic catalytic domains, and thus represents a receptor-type PTP. The extracellular region contains three Ig-like domains, and nine non-Ig like domains similar to that of neural-cell adhesion molecule. This PTP was shown to function in the regulation of epithelial cell-cell contacts at adherents junctions, as well as in the control of beta-catenin signaling. An increased expression level of this protein was found in the insulin-responsive tissue of obese, insulin-resistant individuals, and may contribute to the pathogenesis of insulin resistance. Two alternatively spliced transcript variants of this gene, which encode distinct proteins, have been reported. protein tyrosine phosphatase, receptor type F 5792 NA
CKM ENSG00000104879 The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. creatine kinase, M-type 1158 NA
C3 ENSG00000125730 Complement component C3 plays a central role in the activation of complement system. Its activation is required for both classical and alternative complement activation pathways. The encoded preproprotein is proteolytically processed to generate alpha and beta subunits that form the mature protein, which is then further processed to generate numerous peptide products. The C3a peptide, also known as the C3a anaphylatoxin, modulates inflammation and possesses antimicrobial activity. Mutations in this gene are associated with atypical hemolytic uremic syndrome and age-related macular degeneration in human patients. complement component 3 718 NA
CD163 ENSG00000177575 The protein encoded by this gene is a member of the scavenger receptor cysteine-rich (SRCR) superfamily, and is exclusively expressed in monocytes and macrophages. It functions as an acute phase-regulated receptor involved in the clearance and endocytosis of hemoglobin/haptoglobin complexes by macrophages, and may thereby protect tissues from free hemoglobin-mediated oxidative damage. This protein may also function as an innate immune sensor for bacteria and inducer of local inflammation. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. CD163 molecule 9332 NA
COL1A1 ENSG00000108821 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. collagen type I alpha 1 1277 NA
SELENBP1 ENSG00000143416 This gene encodes a member of the selenium-binding protein family. Selenium is an essential nutrient that exhibits potent anticarcinogenic properties, and deficiency of selenium may cause certain neurologic diseases. The effects of selenium in preventing cancer and neurologic diseases may be mediated by selenium-binding proteins, and decreased expression of this gene may be associated with several types of cancer. The encoded protein may play a selenium-dependent role in ubiquitination/deubiquitination-mediated protein degradation. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. selenium binding protein 1 8991 NA
YBX3 ENSG00000060138 NA Y-box binding protein 3 8531 NA
C1QA ENSG00000173372 This gene encodes a major constituent of the human complement subcomponent C1q. C1q associates with C1r and C1s in order to yield the first component of the serum complement system. Deficiency of C1q has been associated with lupus erythematosus and glomerulonephritis. C1q is composed of 18 polypeptide chains: six A-chains, six B-chains, and six C-chains. Each chain contains a collagen-like region located near the N terminus and a C-terminal globular region. The A-, B-, and C-chains are arranged in the order A-C-B on chromosome 1. This gene encodes the A-chain polypeptide of human complement subcomponent C1q. complement component 1, q subcomponent, A chain 712 NA
TPM2 ENSG00000198467 This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. tropomyosin 2 (beta) 7169 NA
CD44 ENSG00000026508 The protein encoded by this gene is a cell-surface glycoprotein involved in cell-cell interactions, cell adhesion and migration. It is a receptor for hyaluronic acid (HA) and can also interact with other ligands, such as osteopontin, collagens, and matrix metalloproteinases (MMPs). This protein participates in a wide variety of cellular functions including lymphocyte activation, recirculation and homing, hematopoiesis, and tumor metastasis. Transcripts for this gene undergo complex alternative splicing that results in many functionally distinct isoforms, however, the full length nature of some of these variants has not been determined. Alternative splicing is the basis for the structural and functional diversity of this protein, and may be related to tumor metastasis. CD44 molecule (Indian blood group) 960 NA
FLNC ENSG00000128591 This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. filamin C 2318 NA
NA ENSG00000090920 NA NA NA TRUE
RPS18 ENSG00000231500 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S13P family of ribosomal proteins. It is located in the cytoplasm. The gene product of the E. coli ortholog (ribosomal protein S13) is involved in the binding of fMet-tRNA, and thus, in the initiation of translation. This gene is an ortholog of mouse Ke3. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. ribosomal protein S18 6222 NA
ACSL5 ENSG00000197142 The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. This isozyme is highly expressed in uterus and spleen, and in trace amounts in normal brain, but has markedly increased levels in malignant gliomas. This gene functions in mediating fatty acid-induced glioma cell growth. Three transcript variants encoding two different isoforms have been found for this gene. acyl-CoA synthetase long-chain family member 5 51703 NA
COL1A2 ENSG00000164692 This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. collagen type I alpha 2 chain 1278 NA
TPM1 ENSG00000140416 This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. tropomyosin 1 (alpha) 7168 NA
SERPINA1 ENSG00000197249 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. serpin family A member 1 5265 NA
KRT1 ENSG00000167768 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 1 3848 NA
TNNC1 ENSG00000114854 Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z. troponin C1, slow skeletal and cardiac type 7134 NA
PCK2 ENSG00000100889 This gene encodes a mitochondrial enzyme that catalyzes the conversion of oxaloacetate to phosphoenolpyruvate in the presence of guanosine triphosphate (GTP). A cytosolic form of this protein is encoded by a different gene and is the key enzyme of gluconeogenesis in the liver. Alternatively spliced transcript variants have been described. phosphoenolpyruvate carboxykinase 2, mitochondrial 5106 NA
MTCO1P12 ENSG00000237973 NA MT-CO1 pseudogene 12 ENSG00000237973 NA
GAPDH ENSG00000111640 This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. glyceraldehyde-3-phosphate dehydrogenase 2597 NA
MYH11 ENSG00000133392 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. myosin, heavy chain 11, smooth muscle 4629 NA
PLS1 ENSG00000120756 Plastins are a family of actin-binding proteins that are conserved throughout eukaryote evolution and expressed in most tissues of higher eukaryotes. In humans, two ubiquitous plastin isoforms (L and T) have been identified. The protein encoded by this gene is a third distinct plastin isoform, which is specifically expressed at high levels in the small intestine. Alternatively spliced transcript variants varying in the 5’ UTR, but encoding the same protein, have been found for this gene. A pseudogene of this gene is found on chromosome 11. plastin 1 5357 NA
GPX2 ENSG00000176153 This gene is a member of the glutathione peroxidase family and encodes a selenium-dependent glutathione peroxidase that is one of two isoenzymes responsible for the majority of the glutathione-dependent hydrogen peroxide-reducing activity in the epithelium of the gastrointestinal tract. The protein encoded by this locus contains a selenocysteine (Sec) residue encoded by the UGA codon, which normally signals translation termination. Alternatively spliced transcript variants have been described. glutathione peroxidase 2 2877 NA
MUC1 ENSG00000185499 This gene encodes a membrane-bound protein that is a member of the mucin family. Mucins are O-glycosylated proteins that play an essential role in forming protective mucous barriers on epithelial surfaces. These proteins also play a role in intracellular signaling. This protein is expressed on the apical surface of epithelial cells that line the mucosal surfaces of many different tissues including lung, breast stomach and pancreas. This protein is proteolytically cleaved into alpha and beta subunits that form a heterodimeric complex. The N-terminal alpha subunit functions in cell-adhesion and the C-terminal beta subunit is involved in cell signaling. Overexpression, aberrant intracellular localization, and changes in glycosylation of this protein have been associated with carcinomas. This gene is known to contain a highly polymorphic variable number tandem repeats (VNTR) domain. Alternate splicing results in multiple transcript variants. mucin 1, cell surface associated 4582 NA
LLGL2 ENSG00000073350 The lethal (2) giant larvae protein of Drosophila plays a role in asymmetric cell division, epithelial cell polarity, and cell migration. This human gene encodes a protein similar to lethal (2) giant larvae of Drosophila. In fly, the protein’s ability to localize cell fate determinants is regulated by the atypical protein kinase C (aPKC). In human, this protein interacts with aPKC-containing complexes and is cortically localized in mitotic cells. Alternative splicing results in multiple transcript variants encoding different isoforms. LLGL2, scribble cell polarity complex component 3993 NA
CES2 ENSG00000172831 This gene encodes a member of the carboxylesterase large family. The family members are responsible for the hydrolysis or transesterification of various xenobiotics, such as cocaine and heroin, and endogenous substrates with ester, thioester, or amide bonds. They may participate in fatty acyl and cholesterol ester metabolism, and may play a role in the blood-brain barrier system. The protein encoded by this gene is the major intestinal enzyme and functions in intestine drug clearance. Alternatively spliced transcript variants have been found for this gene. carboxylesterase 2 8824 NA
RPS3 ENSG00000149273 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit, where it forms part of the domain where translation is initiated. The protein belongs to the S3P family of ribosomal proteins. Studies of the mouse and rat proteins have demonstrated that the protein has an extraribosomal role as an endonuclease involved in the repair of UV-induced DNA damage. The protein appears to be located in both the cytoplasm and nucleus but not in the nucleolus. Higher levels of expression of this gene in colon adenocarcinomas and adenomatous polyps compared to adjacent normal colonic mucosa have been observed. This gene is co-transcribed with the small nucleolar RNA genes U15A and U15B, which are located in its first and fifth introns, respectively. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Multiple alternatively spliced transcript variants encoding different isoforms have been found for this gene. ribosomal protein S3 6188 NA
C1orf115 ENSG00000162817 NA chromosome 1 open reading frame 115 79762 NA
RP11-510N19.5 ENSG00000249007 NA NA ENSG00000249007 NA
FBLN1 ENSG00000077942 Fibulin 1 is a secreted glycoprotein that becomes incorporated into a fibrillar extracellular matrix. Calcium-binding is apparently required to mediate its binding to laminin and nidogen. It mediates platelet adhesion via binding fibrinogen. Four splice variants which differ in the 3’ end have been identified. Each variant encodes a different isoform, but no functional distinctions have been identified among the four variants. fibulin 1 2192 NA
CCDC80 ENSG00000091986 NA coiled-coil domain containing 80 151887 NA
ACTC1 ENSG00000159251 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). actin, alpha, cardiac muscle 1 70 NA
RBM47 ENSG00000163694 NA RNA binding motif protein 47 54502 NA
SLC26A2 ENSG00000155850 The diastrophic dysplasia sulfate transporter is a transmembrane glycoprotein implicated in the pathogenesis of several human chondrodysplasias. It apparently is critical in cartilage for sulfation of proteoglycans and matrix organization. solute carrier family 26 member 2 1836 NA
LSR ENSG00000105699 NA lipolysis stimulated lipoprotein receptor 51599 NA
FZD5 ENSG00000163251 Members of the ‘frizzled’ gene family encode 7-transmembrane domain proteins that are receptors for Wnt signaling proteins. The FZD5 protein is believed to be the receptor for the Wnt5A ligand. frizzled class receptor 5 7855 NA
CNDP2 ENSG00000133313 CNDP2, also known as tissue carnosinase and peptidase A (EC 3.4.13.18), is a nonspecific dipeptidase rather than a selective carnosinase (Teufel et al., 2003 [PubMed 12473676]). CNDP dipeptidase 2 (metallopeptidase M20 family) 55748 NA
KRT2 ENSG00000172867 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. keratin 2 3849 NA
RETSAT ENSG00000042445 NA retinol saturase 54884 NA
RPS19 ENSG00000105372 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S19E family of ribosomal proteins. It is located in the cytoplasm. Mutations in this gene cause Diamond-Blackfan anemia (DBA), a constitutional erythroblastopenia characterized by absent or decreased erythroid precursors, in a subset of patients. This suggests a possible extra-ribosomal function for this gene in erythropoietic differentiation and proliferation, in addition to its ribosomal function. Higher expression levels of this gene in some primary colon carcinomas compared to matched normal colon tissues has been observed. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. ribosomal protein S19 6223 NA
ACTN2 ENSG00000077522 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. actinin alpha 2 88 NA
TTN ENSG00000155657 This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. titin 7273 NA
CMYA5 ENSG00000164309 NA cardiomyopathy associated 5 202333 NA
MT1G ENSG00000125144 NA metallothionein 1G 4495 NA
PDE4DIP ENSG00000178104 The protein encoded by this gene serves to anchor phosphodiesterase 4D to the Golgi/centrosome region of the cell. Defects in this gene may be a cause of myeloproliferative disorder (MBD) associated with eosinophilia. Several transcript variants encoding different isoforms have been found for this gene. phosphodiesterase 4D interacting protein 9659 NA
RPS8 ENSG00000142937 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 40S subunit. The protein belongs to the S8E family of ribosomal proteins. It is located in the cytoplasm. Increased expression of this gene in colorectal tumors and colon polyps compared to matched normal colonic mucosa has been observed. This gene is co-transcribed with the small nucleolar RNA genes U38A, U38B, U39, and U40, which are located in its fourth, fifth, first, and second introns, respectively. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. ribosomal protein S8 6202 NA
CYP3A5 ENSG00000106258 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. The encoded protein metabolizes drugs as well as the steroid hormones testosterone and progesterone. This gene is part of a cluster of cytochrome P450 genes on chromosome 7q21.1. Two pseudogenes of this gene have been identified within this cluster on chromosome 7. Expression of this gene is widely variable among populations, and a single nucleotide polymorphism that affects transcript splicing has been associated with susceptibility to hypertensions. Alternative splicing results in multiple transcript variants. cytochrome P450 family 3 subfamily A member 5 1577 NA
ABCC3 ENSG00000108846 The protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the MRP subfamily which is involved in multi-drug resistance. The specific function of this protein has not yet been determined; however, this protein may play a role in the transport of biliary and intestinal excretion of organic anions. Alternatively spliced variants which encode different protein isoforms have been described; however, not all variants have been fully characterized. ATP binding cassette subfamily C member 3 8714 NA
MRC2 ENSG00000011028 This gene encodes a member of the mannose receptor family of proteins that contain a fibronectin type II domain and multiple C-type lectin-like domains. The encoded protein plays a role in extracellular matrix remodeling by mediating the internalization and lysosomal degradation of collagen ligands. Expression of this gene may play a role in the tumorigenesis and metastasis of several malignancies including breast cancer, gliomas and metastatic bone disease. mannose receptor C type 2 9902 NA
TMEM37 ENSG00000171227 NA transmembrane protein 37 140738 NA
COX6A2 ENSG00000156885 Cytochrome c oxidase (COX), the terminal enzyme of the mitochondrial respiratory chain, catalyzes the electron transfer from reduced cytochrome c to oxygen. It is a heteromeric complex consisting of 3 catalytic subunits encoded by mitochondrial genes and multiple structural subunits encoded by nuclear genes. The mitochondrially-encoded subunits function in electron transfer, and the nuclear-encoded subunits may be involved in the regulation and assembly of the complex. This nuclear gene encodes polypeptide 2 (heart/muscle isoform) of subunit VIa, and polypeptide 2 is present only in striated muscles. Polypeptide 1 (liver isoform) of subunit VIa is encoded by a different gene, and is found in all non-muscle tissues. These two polypeptides share 66% amino acid sequence identity. cytochrome c oxidase subunit 6A2 1339 NA
COL7A1 ENSG00000114270 This gene encodes the alpha chain of type VII collagen. The type VII collagen fibril, composed of three identical alpha collagen chains, is restricted to the basement zone beneath stratified squamous epithelia. It functions as an anchoring fibril between the external epithelia and the underlying stroma. Mutations in this gene are associated with all forms of dystrophic epidermolysis bullosa. In the absence of mutations, however, an acquired form of this disease can result from an autoimmune response made to type VII collagen. collagen type VII alpha 1 1294 NA
GLUL ENSG00000135821 The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. glutamate-ammonia ligase 2752 NA
FMO5 ENSG00000131781 Metabolic N-oxidation of the diet-derived amino-trimethylamine (TMA) is mediated by flavin-containing monooxygenase and is subject to an inherited FMO3 polymorphism in man resulting in a small subpopulation with reduced TMA N-oxidation capacity resulting in fish odor syndrome Trimethylaminuria. Three forms of the enzyme, FMO1 found in fetal liver, FMO2 found in adult liver, and FMO3 are encoded by genes clustered in the 1q23-q25 region. Flavin-containing monooxygenases are NADPH-dependent flavoenzymes that catalyzes the oxidation of soft nucleophilic heteroatom centers in drugs, pesticides, and xenobiotics. Alternative splicing results in multiple transcript variants. flavin containing monooxygenase 5 2330 NA
EZR ENSG00000092820 The cytoplasmic peripheral membrane protein encoded by this gene functions as a protein-tyrosine kinase substrate in microvilli. As a member of the ERM protein family, this protein serves as an intermediate between the plasma membrane and the actin cytoskeleton. This protein plays a key role in cell surface structure adhesion, migration and organization, and it has been implicated in various human cancers. A pseudogene located on chromosome 3 has been identified for this gene. Alternatively spliced variants have also been described for this gene. ezrin 7430 NA
COL6A2 ENSG00000142173 This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. collagen type VI alpha 2 1292 NA
SGK1 ENSG00000118515 This gene encodes a serine/threonine protein kinase that plays an important role in cellular stress response. This kinase activates certain potassium, sodium, and chloride channels, suggesting an involvement in the regulation of processes such as cell survival, neuronal excitability, and renal sodium excretion. High levels of expression of this gene may contribute to conditions such as hypertension and diabetic nephropathy. Several alternatively spliced transcript variants encoding different isoforms have been noted for this gene. serum/glucocorticoid regulated kinase 1 6446 NA
RPL18 ENSG00000063177 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a member of the L18E family of ribosomal proteins that is a component of the 60S subunit. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ribosomal protein L18 6141 NA
IGHM ENSG00000211899 NA immunoglobulin heavy constant mu ENSG00000211899 NA
PBLD ENSG00000108187 NA phenazine biosynthesis like protein domain containing 64081 NA
RPS11 ENSG00000142534 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a member of the S17P family of ribosomal proteins that is a component of the 40S subunit. This gene is co-transcribed with the small nucleolar RNA gene U35B, which is located in the third intron. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed throughout the genome. ribosomal protein S11 6205 NA
CHGA ENSG00000100604 The protein encoded by this gene is a member of the chromogranin/secretogranin family of neuroendocrine secretory proteins. It is found in secretory vesicles of neurons and endocrine cells. This gene product is a precursor to three biologically active peptides; vasostatin, pancreastatin, and parastatin. These peptides act as autocrine or paracrine negative modulators of the neuroendocrine system. Two other peptides, catestatin and chromofungin, have antimicrobial activity and antifungal activity, respectively. Two transcript variants encoding different isoforms have been found for this gene. chromogranin A 1113 NA
TSPAN13 ENSG00000106537 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. tetraspanin 13 27075 NA
ST14 ENSG00000149418 The protein encoded by this gene is an epithelial-derived, integral membrane serine protease. This protease forms a complex with the Kunitz-type serine protease inhibitor, HAI-1, and is found to be activated by sphingosine 1-phosphate. This protease has been shown to cleave and activate hepatocyte growth factor/scattering factor, and urokinase plasminogen activator, which suggest the function of this protease as an epithelial membrane activator for other proteases and latent growth factors. The expression of this protease has been associated with breast, colon, prostate, and ovarian tumors, which implicates its role in cancer invasion, and metastasis. suppression of tumorigenicity 14 6768 NA
CLDN7 ENSG00000181885 This gene encodes a member of the claudin family. Claudins are integral membrane proteins and components of tight junction strands. Tight junction strands serve as a physical barrier to prevent solutes and water from passing freely through the paracellular space between epithelial or endothelial cell sheets, and also play critical roles in maintaining cell polarity and signal transductions. Differential expression of this gene has been observed in different types of malignancies, including breast cancer, ovarian cancer, hepatocellular carcinomas, urinary tumors, prostate cancer, lung cancer, head and neck cancers, thyroid carcinomas, etc.. Alternatively spliced transcript variants encoding different isoforms have been found. claudin 7 1366 NA
IGLC1 ENSG00000211675 NA immunoglobulin lambda constant 1 (Mcg marker) ENSG00000211675 NA
RPL10A ENSG00000198755 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L1P family of ribosomal proteins. It is located in the cytoplasm. The expression of this gene is downregulated in the thymus by cyclosporin-A (CsA), an immunosuppressive drug. Studies in mice have shown that the expression of the ribosomal protein L10a gene is downregulated in neural precursor cells during development. This gene previously was referred to as NEDD6 (neural precursor cell expressed, developmentally downregulated 6), but it has been renamed RPL10A (ribosomal protein 10a). As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. ribosomal protein L10a 4736 NA
SAA2 ENSG00000134339 NA serum amyloid A2 6289 NA
STAB1 ENSG00000010327 This gene encodes a large, transmembrane receptor protein which may function in angiogenesis, lymphocyte homing, cell adhesion, or receptor scavenging. The protein contains 7 fasciclin, 16 epidermal growth factor (EGF)-like, and 2 laminin-type EGF-like domains as well as a C-type lectin-like hyaluronan-binding Link module. The protein is primarily expressed on sinusoidal endothelial cells of liver, spleen, and lymph node. The receptor has been shown to endocytose ligands such as low density lipoprotein, Gram-positive and Gram-negative bacteria, and advanced glycosylation end products. Supporting its possible role as a scavenger receptor, the protein rapidly cycles between the plasma membrane and early endosomes. stabilin 1 23166 NA
IGLL5 ENSG00000254709 This gene encodes one of the immunoglobulin lambda-like polypeptides. It is located within the immunoglobulin lambda locus but it does not require somatic rearrangement for expression. The first exon of this gene is unrelated to immunoglobulin variable genes; the second and third exons are the immunoglobulin lambda joining 1 and the immunoglobulin lambda constant 1 gene segments. Alternative splicing results in multiple transcript variants. immunoglobulin lambda like polypeptide 5 100423062 NA
CRYL1 ENSG00000165475 The uronate cycle functions as an alternative glucose metabolic pathway, accounting for about 5% of daily glucose catabolism. The product of this gene catalyzes the dehydrogenation of L-gulonate into dehydro-L-gulonate in the uronate cycle. The enzyme requires NAD(H) as a coenzyme, and is inhibited by inorganic phosphate. A similar gene in the rabbit is thought to serve a structural role in the lens of the eye. crystallin lambda 1 51084 NA
METTL7A ENSG00000185432 NA methyltransferase like 7A 25840 NA
PEPD ENSG00000124299 This gene encodes a member of the peptidase family. The protein forms a homodimer that hydrolyzes dipeptides or tripeptides with C-terminal proline or hydroxyproline residues. The enzyme serves an important role in the recycling of proline, and may be rate limiting for the production of collagen. Mutations in this gene result in prolidase deficiency, which is characterized by the excretion of large amount of di- and tri-peptides containing proline. Multiple transcript variants encoding different isoforms have been found for this gene. peptidase D 5184 NA
IQGAP2 ENSG00000145703 This gene encodes a member of the IQGAP family. The protein contains three IQ domains, one calponin homology domain, one Ras-GAP domain and one WW domain. It interacts with components of the cytoskeleton, with cell adhesion molecules, and with several signaling molecules to regulate cell morphology and motility. IQ motif containing GTPase activating protein 2 10788 NA
OLFM4 ENSG00000102837 This gene was originally cloned from human myeloblasts and found to be selectively expressed in inflammed colonic epithelium. This gene encodes a member of the olfactomedin family. The encoded protein is an antiapoptotic factor that promotes tumor growth and is an extracellular matrix glycoprotein that facilitates cell adhesion. olfactomedin 4 10562 NA
FRMD6 ENSG00000139926 NA FERM domain containing 6 122786 NA
RPLP1 ENSG00000137818 Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal phosphoprotein that is a component of the 60S subunit. The protein, which is a functional equivalent of the E. coli L7/L12 ribosomal protein, belongs to the L12P family of ribosomal proteins. It plays an important role in the elongation step of protein synthesis. Unlike most ribosomal proteins, which are basic, the encoded protein is acidic. Its C-terminal end is nearly identical to the C-terminal ends of the ribosomal phosphoproteins P0 and P2. The P1 protein can interact with P0 and P2 to form a pentameric complex consisting of P1 and P2 dimers, and a P0 monomer. The protein is located in the cytoplasm. Two alternatively spliced transcript variants that encode different proteins have been observed. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. ribosomal protein lateral stalk subunit P1 6176 NA
ACTA1 ENSG00000143632 The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. actin, alpha 1, skeletal muscle 58 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",16,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 17 Annotations

out <- mygene::queryMany(gene_list[17,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name query symbol summary X_id notfound
myelin basic protein ENSG00000197971 MBP The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. 4155 NA
NA ENSG00000266844 RP11-862L9.3 NA ENSG00000266844 NA
glial fibrillary acidic protein ENSG00000131095 GFAP This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. 2670 NA
alanyl aminopeptidase, membrane ENSG00000166825 ANPEP Aminopeptidase N is located in the small-intestinal and renal microvillar membrane, and also in other plasma membranes. In the small intestine aminopeptidase N plays a role in the final digestion of peptides generated from hydrolysis of proteins by gastric and pancreatic proteases. Its function in proximal tubular epithelial cells and other cell types is less clear. The large extracellular carboxyterminal domain contains a pentapeptide consensus sequence characteristic of members of the zinc-binding metalloproteinase superfamily. Sequence comparisons with known enzymes of this class showed that CD13 and aminopeptidase N are identical. The latter enzyme was thought to be involved in the metabolism of regulatory peptides by diverse cell types, including small intestinal and renal tubular epithelial cells, macrophages, granulocytes, and synaptic membranes from the CNS. Human aminopeptidase N is a receptor for one strain of human coronavirus that is an important cause of upper respiratory tract infections. Defects in this gene appear to be a cause of various types of leukemia or lymphoma. 290 NA
keratin 13 ENSG00000171401 KRT13 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. This type I cytokeratin is paired with keratin 4 and expressed in the suprabasal layers of non-cornified stratified epithelia. Mutations in this gene and keratin 4 have been associated with the autosomal dominant disorder White Sponge Nevus. The type I cytokeratins are clustered in a region of chromosome 17q21.2. Alternative splicing of this gene results in multiple transcript variants; however, not all variants have been described. 3860 NA
maturin, neural progenitor differentiation regulator homolog (Xenopus) ENSG00000180354 MTURN NA 222166 NA
S100 calcium binding protein B ENSG00000160307 S100B The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21; however, this gene is located at 21q22.3. This protein may function in Neurite extension, proliferation of melanoma cells, stimulation of Ca2+ fluxes, inhibition of PKC-mediated phosphorylation, astrocytosis and axonal proliferation, and inhibition of microtubule assembly. Chromosomal rearrangements and altered expression of this gene have been implicated in several neurological, neoplastic, and other types of diseases, including Alzheimer’s disease, Down’s syndrome, epilepsy, amyotrophic lateral sclerosis, melanoma, and type I diabetes. 6285 NA
regulator of G-protein signaling 1 ENSG00000090104 RGS1 This gene encodes a member of the regulator of G-protein signalling family. This protein is located on the cytosolic side of the plasma membrane and contains a conserved, 120 amino acid motif called the RGS domain. The protein attenuates the signalling activity of G-proteins by binding to activated, GTP-bound G alpha subunits and acting as a GTPase activating protein (GAP), increasing the rate of conversion of the GTP to GDP. This hydrolysis allows the G alpha subunits to bind G beta/gamma subunit heterodimers, forming inactive G-protein heterotrimers, thereby terminating the signal. 5996 NA
latent transforming growth factor beta binding protein 4 ENSG00000090006 LTBP4 The protein encoded by this gene binds transforming growth factor beta (TGFB) as it is secreted and targeted to the extracellular matrix. TGFB is biologically latent after secretion and insertion into the extracellular matrix, and sheds TGFB and other proteins upon activation. Defects in this gene may be a cause of cutis laxa and severe pulmonary, gastrointestinal, and urinary abnormalities. Three transcript variants encoding different isoforms have been found for this gene. 8425 NA
collagen type I alpha 1 ENSG00000108821 COL1A1 This gene encodes the pro-alpha1 chains of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIA, Ehlers-Danlos syndrome Classical type, Caffey Disease and idiopathic osteoporosis. Reciprocal translocations between chromosomes 17 and 22, where this gene and the gene for platelet-derived growth factor beta are located, are associated with a particular type of skin tumor called dermatofibrosarcoma protuberans, resulting from unregulated expression of the growth factor. Two transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1277 NA
N-myc downstream regulated 1 ENSG00000104419 NDRG1 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein involved in stress responses, hormone responses, cell growth, and differentiation. The encoded protein is necessary for p53-mediated caspase activation and apoptosis. Mutations in this gene are a cause of Charcot-Marie-Tooth disease type 4D, and expression of this gene may be a prognostic indicator for several types of cancer. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 10397 NA
keratin 4 ENSG00000170477 KRT4 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in differentiated layers of the mucosal and esophageal epithelia with family member KRT13. Mutations in these genes have been associated with White Sponge Nevus, characterized by oral, esophageal, and anal leukoplakia. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. 3851 NA
heat shock protein 90kDa alpha family class A member 1 ENSG00000080824 HSP90AA1 The protein encoded by this gene is an inducible molecular chaperone that functions as a homodimer. The encoded protein aids in the proper folding of specific target proteins by use of an ATPase activity that is modulated by co-chaperones. Two transcript variants encoding different isoforms have been found for this gene. 3320 NA
claudin domain containing 1 ENSG00000080822 CLDND1 NA 56650 NA
pleckstrin homology domain containing B1 ENSG00000021300 PLEKHB1 NA 58473 NA
collagen type I alpha 2 chain ENSG00000164692 COL1A2 This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. 1278 NA
membrane metallo-endopeptidase ENSG00000196549 MME This gene encodes a common acute lymphocytic leukemia antigen that is an important cell surface marker in the diagnosis of human acute lymphocytic leukemia (ALL). This protein is present on leukemic cells of pre-B phenotype, which represent 85% of cases of ALL. This protein is not restricted to leukemic cells, however, and is found on a variety of normal tissues. It is a glycoprotein that is particularly abundant in kidney, where it is present on the brush border of proximal tubules and on glomerular epithelium. The protein is a neutral endopeptidase that cleaves peptides at the amino side of hydrophobic residues and inactivates several peptide hormones including glucagon, enkephalins, substance P, neurotensin, oxytocin, and bradykinin. This gene, which encodes a 100-kD type II transmembrane glycoprotein, exists in a single copy of greater than 45 kb. The 5’ untranslated region of this gene is alternatively spliced, resulting in four separate mRNA transcripts. The coding region is not affected by alternative splicing. 4311 NA
tumor protein p53 inducible nuclear protein 2 ENSG00000078804 TP53INP2 NA 58476 NA
progestin and adipoQ receptor family member 6 ENSG00000160781 PAQR6 NA 79957 NA
eukaryotic translation elongation factor 2 ENSG00000167658 EEF2 This gene encodes a member of the GTP-binding translation elongation factor family. This protein is an essential factor for protein synthesis. It promotes the GTP-dependent translocation of the nascent protein chain from the A-site to the P-site of the ribosome. This protein is completely inactivated by EF-2 kinase phosporylation. 1938 NA
laminin subunit alpha 5 ENSG00000130702 LAMA5 This gene encodes one of the vertebrate laminin alpha chains. Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. The protein encoded by this gene is the alpha-5 subunit of of laminin-10 (laminin-511), laminin-11 (laminin-521) and laminin-15 (laminin-523). 3911 NA
CD9 molecule ENSG00000010278 CD9 This gene encodes a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Tetraspanins are cell surface glycoproteins with four transmembrane domains that form multimeric complexes with other cell surface proteins. The encoded protein functions in many cellular processes including differentiation, adhesion, and signal transduction, and expression of this gene plays a critical role in the suppression of cancer cell motility and metastasis. 928 NA
prostaglandin D2 synthase ENSG00000107317 PTGDS The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. 5730 NA
ZFP36 ring finger protein ENSG00000128016 ZFP36 NA 7538 NA
myelin protein zero ENSG00000158887 MPZ This gene is specifically expressed in Schwann cells of the peripheral nervous system and encodes a type I transmembrane glycoprotein that is a major structural protein of the peripheral myelin sheath. The encoded protein contains a large hydrophobic extracellular domain and a smaller basic intracellular domain, which are essential for the formation and stabilization of the multilamellar structure of the compact myelin. Mutations in this gene are associated with autosomal dominant form of Charcot-Marie-Tooth disease type 1 (CMT1B) and other polyneuropathies, such as Dejerine-Sottas syndrome (DSS) and congenital hypomyelinating neuropathy (CHN). A recent study showed that two isoforms are produced from the same mRNA by use of alternative in-frame translation termination codons via a stop codon readthrough mechanism. 4359 NA
NA ENSG00000229732 AC019349.5 NA ENSG00000229732 NA
septin 4 ENSG00000108387 SEPT4 This gene is a member of the septin family of nucleotide binding proteins, originally described in yeast as cell division cycle regulatory proteins. Septins are highly conserved in yeast, Drosophila, and mouse, and appear to regulate cytoskeletal organization. Disruption of septin function disturbs cytokinesis and results in large multinucleate or polyploid cells. This gene is highly expressed in brain and heart. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. One of the isoforms (known as ARTS) is distinct; it is localized to the mitochondria, and has a role in apoptosis and cancer. 5414 NA
interleukin 1 receptor antagonist ENSG00000136689 IL1RN The protein encoded by this gene is a member of the interleukin 1 cytokine family. This protein inhibits the activities of interleukin 1, alpha (IL1A) and interleukin 1, beta (IL1B), and modulates a variety of interleukin 1 related immune and inflammatory responses. This gene and five other closely related cytokine genes form a gene cluster spanning approximately 400 kb on chromosome 2. A polymorphism of this gene is reported to be associated with increased risk of osteoporotic fractures and gastric cancer. Several alternatively spliced transcript variants encoding distinct isoforms have been reported. 3557 NA
cystatin A ENSG00000121552 CSTA The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins, and kininogens. This gene encodes a stefin that functions as a cysteine protease inhibitor, forming tight complexes with papain and the cathepsins B, H, and L. The protein is one of the precursor proteins of cornified cell envelope in keratinocytes and plays a role in epidermal development and maintenance. Stefins have been proposed as prognostic and diagnostic tools for cancer. 1475 NA
plakophilin 4 ENSG00000144283 PKP4 Armadillo-like proteins are characterized by a series of armadillo repeats, first defined in the Drosophila ‘armadillo’ gene product, that are typically 42 to 45 amino acids in length. These proteins can be divided into subfamilies based on their number of repeats, their overall sequence similarity, and the dispersion of the repeats throughout their sequences. Members of the p120(ctn)/plakophilin subfamily of Armadillo-like proteins, including CTNND1, CTNND2, PKP1, PKP2, PKP4, and ARVCF. PKP4 may be a component of desmosomal plaque and other adhesion plaques and is thought to be involved in regulating junctional plaque organization and cadherin function. Multiple transcript variants encoding different isoforms have been found for this gene. 8502 NA
CD63 molecule ENSG00000135404 CD63 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. The encoded protein is a cell surface glycoprotein that is known to complex with integrins. It may function as a blood platelet activation marker. Deficiency of this protein is associated with Hermansky-Pudlak syndrome. Also this gene has been associated with tumor progression. Alternative splicing results in multiple transcript variants encoding different protein isoforms. 967 NA
ankyrin repeat domain 1 ENSG00000148677 ANKRD1 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. 27063 NA
small proline rich protein 3 ENSG00000163209 SPRR3 NA 6707 NA
pentraxin 3 ENSG00000163661 PTX3 NA 5806 NA
endoplasmic reticulum-golgi intermediate compartment 1 ENSG00000113719 ERGIC1 This gene encodes a cycling membrane protein which is an endoplasmic reticulum-golgi intermediate compartment (ERGIC) protein which interacts with other members of this protein family to increase their turnover. 57222 NA
S100 calcium binding protein A9 ENSG00000163220 S100A9 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. 6280 NA
serpin family B member 9 ENSG00000170542 SERPINB9 This gene encodes a member of the serine protease inhibitor family which are also known as serpins. The encoded protein belongs to a subfamily of intracellular serpins. This protein inhibits the activity of the effector molecule granzyme B. Overexpression of this protein may prevent cytotoxic T-lymphocytes from eliminating certain tumor cells. A pseudogene of this gene is found on chromosome 6. 5272 NA
obscurin-like 1 ENSG00000124006 OBSL1 Cytoskeletal adaptor proteins function in linking the internal cytoskeleton of cells to the cell membrane. This gene encodes a cytoskeletal adaptor protein, which is a member of the Unc-89/obscurin family. The protein contains multiple N- and C-terminal immunoglobulin (Ig)-like domains and a central fibronectin type 3 domain. Mutations in this gene cause 3M syndrome type 2. Alternatively spliced transcript variants encoding different isoforms have been found in this gene. 23363 NA
StAR related lipid transfer domain containing 9 ENSG00000159433 STARD9 NA 57519 NA
XIAP associated factor 1 ENSG00000132530 XAF1 This gene encodes a protein which binds to and counteracts the inhibitory effect of a member of the IAP (inhibitor of apoptosis) protein family. IAP proteins bind to and inhibit caspases which are activated during apoptosis. The proportion of IAPs and proteins which interfere with their activity, such as the encoded protein, affect the progress of the apoptosis signaling pathway. Multiple transcript variants encoding different isoforms have been found for this gene. 54739 NA
basic helix-loop-helix family member e40 ENSG00000134107 BHLHE40 This gene encodes a basic helix-loop-helix protein expressed in various tissues. The encoded protein can interact with ARNTL or compete for E-box binding sites in the promoter of PER1 and repress CLOCK/ARNTL’s transactivation of PER1. This gene is believed to be involved in the control of circadian rhythm and cell differentiation. 8553 NA
cystatin B ENSG00000160213 CSTB The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and kininogens. This gene encodes a stefin that functions as an intracellular thiol protease inhibitor. The protein is able to form a dimer stabilized by noncovalent forces, inhibiting papain and cathepsins l, h and b. The protein is thought to play a role in protecting against the proteases leaking from lysosomes. Evidence indicates that mutations in this gene are responsible for the primary defects in patients with progressive myoclonic epilepsy (EPM1). 1476 NA
hypoxia inducible lipid droplet associated ENSG00000135245 HILPDA NA 29923 NA
thioredoxin reductase 1 ENSG00000198431 TXNRD1 This gene encodes a member of the family of pyridine nucleotide oxidoreductases. This protein reduces thioredoxins as well as other substrates, and plays a role in selenium metabolism and protection against oxidative stress. The functional enzyme is thought to be a homodimer which uses FAD as a cofactor. Each subunit contains a selenocysteine (Sec) residue which is required for catalytic activity. The selenocysteine is encoded by the UGA codon that normally signals translation termination. The 3’ UTR of selenocysteine-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), that is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. Alternative splicing results in several transcript variants encoding the same or different isoforms. 7296 NA
transferrin ENSG00000091513 TF This gene encodes a glycoprotein with an approximate molecular weight of 76.5 kDa. It is thought to have been created as a result of an ancient gene duplication event that led to generation of homologous C and N-terminal domains each of which binds one ion of ferric iron. The function of this protein is to transport iron from the intestine, reticuloendothelial system, and liver parenchymal cells to all proliferating cells in the body. This protein may also have a physiologic role as granulocyte/pollen-binding protein (GPBP) involved in the removal of certain organic matter and allergens from serum. 7018 NA
periaxin ENSG00000105227 PRX This gene encodes a protein involved in peripheral nerve myelin upkeep. The encoded protein contains 2 PDZ domains which were named after PSD95 (post synaptic density protein), DlgA (Drosophila disc large tumor suppressor), and ZO1 (a mammalian tight junction protein). Two alternatively spliced transcript variants have been described for this gene which encode different protein isoforms and which are targeted differently in the Schwann cell. Mutations in this gene cause Charcot-Marie-Tooth neuoropathy, type 4F and Dejerine-Sottas neuropathy. 57716 NA
semaphorin 4C ENSG00000168758 SEMA4C NA 54910 NA
IKAROS family zinc finger 2 ENSG00000030419 IKZF2 This gene encodes a member of the Ikaros family of zinc-finger proteins. Three members of this protein family (Ikaros, Aiolos and Helios) are hematopoietic-specific transcription factors involved in the regulation of lymphocyte development. This protein forms homo- or hetero-dimers with other Ikaros family members, and is thought to function predominantly in early hematopoietic development. Multiple transcript variants encoding different isoforms have been found for this gene, but the biological validity of some variants has not been determined. 22807 NA
WNK lysine deficient protein kinase 1 ENSG00000060237 WNK1 This gene encodes a member of the WNK subfamily of serine/threonine protein kinases. The encoded protein may be a key regulator of blood pressure by controlling the transport of sodium and chloride ions. Mutations in this gene have been associated with pseudohypoaldosteronism type II and hereditary sensory neuropathy type II. Alternatively spliced transcript variants encoding different isoforms have been described but the full-length nature of all of them has yet to be determined. 65125 NA
spectrin beta, non-erythrocytic 1 ENSG00000115306 SPTBN1 Spectrin is an actin crosslinking and molecular scaffold protein that links the plasma membrane to the actin cytoskeleton, and functions in the determination of cell shape, arrangement of transmembrane proteins, and organization of organelles. It is composed of two antiparallel dimers of alpha- and beta- subunits. This gene is one member of a family of beta-spectrin genes. The encoded protein contains an N-terminal actin-binding domain, and 17 spectrin repeats which are involved in dimer formation. Multiple transcript variants encoding different isoforms have been found for this gene. 6711 NA
high density lipoprotein binding protein ENSG00000115677 HDLBP The protein encoded by this gene binds high density lipoprotein (HDL) and may function to regulate excess cholesterol levels in cells. The encoded protein also binds RNA and can induce heterochromatin formation. 3069 NA
myelin protein zero like 2 ENSG00000149573 MPZL2 Thymus development depends on a complex series of interactions between thymocytes and the stromal component of the organ. Epithelial V-like antigen (EVA) is expressed in thymus epithelium and strongly downregulated by thymocyte developmental progression. This gene is expressed in the thymus and in several epithelial structures early in embryogenesis. It is highly homologous to the myelin protein zero and, in thymus-derived epithelial cell lines, is poorly soluble in nonionic detergents, strongly suggesting an association to the cytoskeleton. Its capacity to mediate cell adhesion through a homophilic interaction and its selective regulation by T cell maturation might imply the participation of EVA in the earliest phases of thymus organogenesis. The protein bears a characteristic V-type domain and two potential N-glycosylation sites in the extracellular domain; a putative serine phosphorylation site for casein kinase 2 is also present in the cytoplasmic tail. Two transcript variants encoding the same protein have been found for this gene. 10205 NA
small ArfGAP2 ENSG00000084070 SMAP2 NA 64744 NA
vimentin ENSG00000026025 VIM This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. 7431 NA
arginine and glutamate rich 1 ENSG00000134884 ARGLU1 NA 55082 NA
dystonin ENSG00000151914 DST This gene encodes a member of the plakin protein family of adhesion junction plaque proteins. Multiple alternatively spliced transcript variants encoding distinct isoforms have been found for this gene, but the full-length nature of some variants has not been defined. It has been reported that some isoforms are expressed in neural and muscle tissue, anchoring neural intermediate filaments to the actin cytoskeleton, and some isoforms are expressed in epithelial tissue, anchoring keratin-containing intermediate filaments to hemidesmosomes. Consistent with the expression, mice defective for this gene show skin blistering and neurodegeneration. 667 NA
DYX1C1-CCPG1 readthrough (NMD candidate) ENSG00000261771 DYX1C1-CCPG1 This locus represents naturally occurring read-through transcription between the neighboring dyslexia susceptibility 1 candidate 1 (DYX1C1) and cell cycle progression 1 (CCPG1) genes on chromosome 15. The read-through transcript is a candidate for nonsense-mediated mRNA decay (NMD), and is thus unlikely to produce a protein product. 100533483 NA
2’-5’-oligoadenylate synthetase 3 ENSG00000111331 OAS3 This gene encodes an enzyme included in the 2’, 5’ oligoadenylate synthase family. This enzyme is induced by interferons and catalyzes the 2’, 5’ oligomers of adenosine in order to bind and activate RNase L. This enzyme family plays a significant role in the inhibition of cellular protein synthesis and viral infection resistance. 4940 NA
brain protein I3 ENSG00000164713 BRI3 NA 25798 NA
CDC like kinase 1 ENSG00000013441 CLK1 This gene encodes a member of the CDC2-like (or LAMMER) family of dual specificity protein kinases. In the nucleus, the encoded protein phosphorylates serine/arginine-rich proteins involved in pre-mRNA processing, releasing them into the nucleoplasm. The choice of splice sites during pre-mRNA processing may be regulated by the concentration of transacting factors, including serine/arginine rich proteins. Therefore, the encoded protein may play an indirect role in governing splice site selection. Multiple transcript variants encoding different isoforms have been found for this gene. 1195 NA
titin ENSG00000155657 TTN This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. 7273 NA
colony stimulating factor 3 receptor ENSG00000119535 CSF3R The protein encoded by this gene is the receptor for colony stimulating factor 3, a cytokine that controls the production, differentiation, and function of granulocytes. The encoded protein, which is a member of the family of cytokine receptors, may also function in some cell surface adhesion or recognition processes. Alternatively spliced transcript variants have been described. Mutations in this gene are a cause of Kostmann syndrome, also known as severe congenital neutropenia. 1441 NA
carboxypeptidase A1 ENSG00000091704 CPA1 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. 1357 NA
von Willebrand factor A domain containing 1 ENSG00000179403 VWA1 VWA1 belongs to the von Willebrand factor (VWF; MIM 613160) A (VWFA) domain superfamily of extracellular matrix proteins and appears to play a role in cartilage structure and function (Fitzgerald et al., 2002 [PubMed 12062410]). 64856 NA
heat shock protein family A (Hsp70) member 1B ENSG00000204388 HSPA1B This intronless gene encodes a 70kDa heat shock protein which is a member of the heat shock protein 70 family. In conjuction with other heat shock proteins, this protein stabilizes existing proteins against aggregation and mediates the folding of newly translated proteins in the cytosol and in organelles. It is also involved in the ubiquitin-proteasome pathway through interaction with the AU-rich element RNA-binding protein 1. The gene is located in the major histocompatibility complex class III region, in a cluster with two closely related genes which encode similar proteins. 3304 NA
prolyl 4-hydroxylase subunit beta ENSG00000185624 P4HB This gene encodes the beta subunit of prolyl 4-hydroxylase, a highly abundant multifunctional enzyme that belongs to the protein disulfide isomerase family. When present as a tetramer consisting of two alpha and two beta subunits, this enzyme is involved in hydroxylation of prolyl residues in preprocollagen. This enzyme is also a disulfide isomerase containing two thioredoxin domains that catalyze the formation, breakage and rearrangement of disulfide bonds. Other known functions include its ability to act as a chaperone that inhibits aggregation of misfolded proteins in a concentration-dependent manner, its ability to bind thyroid hormone, its role in both the influx and efflux of S-nitrosothiol-bound nitric oxide, and its function as a subunit of the microsomal triglyceride transfer protein complex. 5034 NA
cytochrome P450 family 17 subfamily A member 1 ENSG00000148795 CYP17A1 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. It has both 17alpha-hydroxylase and 17,20-lyase activities and is a key enzyme in the steroidogenic pathway that produces progestins, mineralocorticoids, glucocorticoids, androgens, and estrogens. Mutations in this gene are associated with isolated steroid-17 alpha-hydroxylase deficiency, 17-alpha-hydroxylase/17,20-lyase deficiency, pseudohermaphroditism, and adrenal hyperplasia. 1586 NA
LRRC75A antisense RNA 1 ENSG00000175061 LRRC75A-AS1 NA 125144 NA
LDL receptor related protein associated protein 1 ENSG00000163956 LRPAP1 This gene encodes a protein that interacts with the low density lipoprotein (LDL) receptor-related protein and facilitates its proper folding and localization by preventing the binding of ligands. Mutations in this gene have been identified in individuals with myopia 23. Alternative splicing results in multiple transcript variants. 4043 NA
acyl-CoA synthetase long-chain family member 5 ENSG00000197142 ACSL5 The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. This isozyme is highly expressed in uterus and spleen, and in trace amounts in normal brain, but has markedly increased levels in malignant gliomas. This gene functions in mediating fatty acid-induced glioma cell growth. Three transcript variants encoding two different isoforms have been found for this gene. 51703 NA
cornulin ENSG00000143536 CRNN This gene encodes a member of the ‘fused gene’ family of proteins, which contain N-terminus EF-hand domains and multiple tandem peptide repeats. The encoded protein contains two EF-hand Ca2+ binding domains in its N-terminus and two glutamine- and threonine-rich 60 amino acid repeats in its C-terminus. This gene, also known as squamous epithelial heat shock protein 53, may play a role in the mucosal/epithelial immune response and epidermal differentiation. 49860 NA
activating transcription factor 3 ENSG00000162772 ATF3 This gene encodes a member of the mammalian activation transcription factor/cAMP responsive element-binding (CREB) protein family of transcription factors. This gene is induced by a variety of signals, including many of those encountered by cancer cells, and is involved in the complex process of cellular stress response. Multiple transcript variants encoding different isoforms have been found for this gene. It is possible that alternative splicing of this gene may be physiologically important in the regulation of target genes. 467 NA
QKI, KH domain containing, RNA binding ENSG00000112531 QKI The protein encoded by this gene is an RNA-binding protein that regulates pre-mRNA splicing, export of mRNAs from the nucleus, protein translation, and mRNA stability. The encoded protein is involved in myelinization and oligodendrocyte differentiation and may play a role in schizophrenia. Multiple transcript variants encoding different isoforms have been found for this gene. 9444 NA
regenerating family member 1 alpha ENSG00000115386 REG1A This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. 5967 NA
transforming growth factor beta induced ENSG00000120708 TGFBI This gene encodes an RGD-containing protein that binds to type I, II and IV collagens. The RGD motif is found in many extracellular matrix proteins modulating cell adhesion and serves as a ligand recognition sequence for several integrins. This protein plays a role in cell-collagen interactions and may be involved in endochondrial bone formation in cartilage. The protein is induced by transforming growth factor-beta and acts to inhibit cell adhesion. Mutations in this gene are associated with multiple types of corneal dystrophy. 7045 NA
protease, serine 1 ENSG00000204983 PRSS1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. 5644 NA
heterogeneous nuclear ribonucleoprotein A3 ENSG00000170144 HNRNPA3 NA 220988 NA
2’-5’-oligoadenylate synthetase 2 ENSG00000111335 OAS2 This gene encodes a member of the 2-5A synthetase family, essential proteins involved in the innate immune response to viral infection. The encoded protein is induced by interferons and uses adenosine triphosphate in 2’-specific nucleotidyl transfer reactions to synthesize 2’,5’-oligoadenylates (2-5As). These molecules activate latent RNase L, which results in viral RNA degradation and the inhibition of viral replication. The three known members of this gene family are located in a cluster on chromosome 12. Alternatively spliced transcript variants encoding different isoforms have been described. 4939 NA
phosphatidylinositol-5-phosphate 4-kinase type 2 alpha ENSG00000150867 PIP4K2A Phosphatidylinositol-5,4-bisphosphate, the precursor to second messengers of the phosphoinositide signal transduction pathways, is thought to be involved in the regulation of secretion, cell proliferation, differentiation, and motility. The protein encoded by this gene is one of a family of enzymes capable of catalyzing the phosphorylation of phosphatidylinositol-5-phosphate on the fourth hydroxyl of the myo-inositol ring to form phosphatidylinositol-5,4-bisphosphate. The amino acid sequence of this enzyme does not show homology to other kinases, but the recombinant protein does exhibit kinase activity. This gene is a member of the phosphatidylinositol-5-phosphate 4-kinase family. 5305 NA
selectin P ligand ENSG00000110876 SELPLG This gene encodes a glycoprotein that functions as a high affinity counter-receptor for the cell adhesion molecules P-, E- and L- selectin expressed on myeloid cells and stimulated T lymphocytes. As such, this protein plays a critical role in leukocyte trafficking during inflammation by tethering of leukocytes to activated platelets or endothelia expressing selectins. This protein requires two post-translational modifications, tyrosine sulfation and the addition of the sialyl Lewis x tetrasaccharide (sLex) to its O-linked glycans, for its high-affinity binding activity. Aberrant expression of this gene and polymorphisms in this gene are associated with defects in the innate and adaptive immune response. Alternate splicing results in multiple transcript variants. 6404 NA
keratin 10 ENSG00000186395 KRT10 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. 3858 NA
OS9, endoplasmic reticulum lectin ENSG00000135506 OS9 This gene encodes a protein that is highly expressed in osteosarcomas. This protein binds to the hypoxia-inducible factor 1 (HIF-1), a key regulator of the hypoxic response and angiogenesis, and promotes the degradation of one of its subunits. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. 10956 NA
lysophosphatidic acid receptor 6 ENSG00000139679 LPAR6 The protein encoded by this gene belongs to the family of G-protein coupled receptors, that are preferentially activated by adenosine and uridine nucleotides. This gene aligns with an internal intron of the retinoblastoma susceptibility gene in the reverse orientation. Alternative splicing results in multiple transcript variants. 10161 NA
lipase F, gastric type ENSG00000182333 LIPF This gene encodes gastric lipase, an enzyme involved in the digestion of dietary triglycerides in the gastrointestinal tract, and responsible for 30% of fat digestion processes occurring in human. It is secreted by gastric chief cells in the fundic mucosa of the stomach, and it hydrolyzes the ester bonds of triglycerides under acidic pH conditions. The gene is a member of a conserved gene family of lipases that play distinct roles in neutral lipid metabolism. Several transcript variants encoding different isoforms have been found for this gene. 8513 NA
TAR DNA binding protein ENSG00000120948 TARDBP HIV-1, the causative agent of acquired immunodeficiency syndrome (AIDS), contains an RNA genome that produces a chromosomally integrated DNA during the replicative cycle. Activation of HIV-1 gene expression by the transactivator Tat is dependent on an RNA regulatory element (TAR) located downstream of the transcription initiation site. The protein encoded by this gene is a transcriptional repressor that binds to chromosomally integrated TAR DNA and represses HIV-1 transcription. In addition, this protein regulates alternate splicing of the CFTR gene. A similar pseudogene is present on chromosome 20. 23435 NA
adipocyte plasma membrane associated protein ENSG00000101474 APMAP NA 57136 NA
phosphatidylinositol-4-phosphate 3-kinase catalytic subunit type 2 beta ENSG00000133056 PIK3C2B The protein encoded by this gene belongs to the phosphoinositide 3-kinase (PI3K) family. PI3-kinases play roles in signaling pathways involved in cell proliferation, oncogenic transformation, cell survival, cell migration, and intracellular protein trafficking. This protein contains a lipid kinase catalytic domain as well as a C-terminal C2 domain, a characteristic of class II PI3-kinases. C2 domains act as calcium-dependent phospholipid binding motifs that mediate translocation of proteins to membranes, and may also mediate protein-protein interactions. The PI3-kinase activity of this protein is sensitive to low nanomolar levels of the inhibitor wortmanin. The C2 domain of this protein was shown to bind phospholipids but not Ca2+, which suggests that this enzyme may function in a calcium-independent manner. 5287 NA
myeloid cell nuclear differentiation antigen ENSG00000163563 MNDA The myeloid cell nuclear differentiation antigen (MNDA) is detected only in nuclei of cells of the granulocyte-monocyte lineage. A 200-amino acid region of human MNDA is strikingly similar to a region in the proteins encoded by a family of interferon-inducible mouse genes, designated Ifi-201, Ifi-202, and Ifi-203, that are not regulated in a cell- or tissue-specific fashion. The 1.8-kb MNDA mRNA, which contains an interferon-stimulated response element in the 5-prime untranslated region, was significantly upregulated in human monocytes exposed to interferon alpha. MNDA is located within 2,200 kb of FCER1A, APCS, CRP, and SPTA1. In its pattern of expression and/or regulation, MNDA resembles IFI16, suggesting that these genes participate in blood cell-specific responses to interferons. 4332 NA
syndecan 1 ENSG00000115884 SDC1 The protein encoded by this gene is a transmembrane (type I) heparan sulfate proteoglycan and is a member of the syndecan proteoglycan family. The syndecans mediate cell binding, cell signaling, and cytoskeletal organization and syndecan receptors are required for internalization of the HIV-1 tat protein. The syndecan-1 protein functions as an integral membrane protein and participates in cell proliferation, cell migration and cell-matrix interactions via its receptor for extracellular matrix proteins. Altered syndecan-1 expression has been detected in several different tumor types. While several transcript variants may exist for this gene, the full-length natures of only two have been described to date. These two represent the major variants of this gene and encode the same protein. 6382 NA
chymotrypsin like elastase family member 3A ENSG00000142789 CELA3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. 10136 NA
ornithine decarboxylase antizyme 1 ENSG00000104904 OAZ1 The protein encoded by this gene belongs to the ornithine decarboxylase antizyme family, which plays a role in cell growth and proliferation by regulating intracellular polyamine levels. Expression of antizymes requires +1 ribosomal frameshifting, which is enhanced by high levels of polyamines. Antizymes in turn bind to and inhibit ornithine decarboxylase (ODC), the key enzyme in polyamine biosynthesis; thus, completing the auto-regulatory circuit. This gene encodes antizyme 1, the first member of the antizyme family, that has broad tissue distribution, and negatively regulates intracellular polyamine levels by binding to and targeting ODC for degradation, as well as inhibiting polyamine uptake. Antizyme 1 mRNA contains two potential in-frame AUGs; and studies in rat suggest that alternative use of the two translation initiation sites results in N-terminally distinct protein isoforms with different subcellular localization. Alternatively spliced transcript variants have also been noted for this gene. 4946 NA
misshapen like kinase 1 ENSG00000141503 MINK1 This gene encodes a serine/threonine kinase belonging to the germinal center kinase (GCK) family. The protein is structurally similar to the kinases that are related to NIK and may belong to a distinct subfamily of NIK-related kinases within the GCK family. Studies of the mouse homolog indicate an up-regulation of expression in the course of postnatal mouse cerebral development and activation of the cJun N-terminal kinase (JNK) and the p38 pathways. 50488 NA
AHNAK nucleoprotein ENSG00000124942 AHNAK NA 79026 NA
Rap guanine nucleotide exchange factor 5 ENSG00000136237 RAPGEF5 Members of the RAS (see HRAS; MIM 190020) subfamily of GTPases function in signal transduction as GTP/GDP-regulated switches that cycle between inactive GDP- and active GTP-bound states. Guanine nucleotide exchange factors (GEFs), such as RAPGEF5, serve as RAS activators by promoting acquisition of GTP to maintain the active GTP-bound state and are the key link between cell surface receptors and RAS activation (Rebhun et al., 2000 [PubMed 10934204]). 9771 NA
intercellular adhesion molecule 3 ENSG00000076662 ICAM3 The protein encoded by this gene is a member of the intercellular adhesion molecule (ICAM) family. All ICAM proteins are type I transmembrane glycoproteins, contain 2-9 immunoglobulin-like C2-type domains, and bind to the leukocyte adhesion LFA-1 protein. This protein is constitutively and abundantly expressed by all leucocytes and may be the most important ligand for LFA-1 in the initiation of the immune response. It functions not only as an adhesion molecule, but also as a potent signalling molecule. Alternative splicing results in multiple transcript variants encoding different isoforms. 3385 NA
baculoviral IAP repeat containing 3 ENSG00000023445 BIRC3 This gene encodes a member of the IAP family of proteins that inhibit apoptosis by binding to tumor necrosis factor receptor-associated factors TRAF1 and TRAF2, probably by interfering with activation of ICE-like proteases. The encoded protein inhibits apoptosis induced by serum deprivation but does not affect apoptosis resulting from exposure to menadione, a potent inducer of free radicals. It contains 3 baculovirus IAP repeats and a ring finger domain. Transcript variants encoding the same isoform have been identified. 330 NA
peripheral myelin protein 22 ENSG00000109099 PMP22 This gene encodes an integral membrane protein that is a major component of myelin in the peripheral nervous system. Studies suggest two alternately used promoters drive tissue-specific expression. Various mutations of this gene are causes of Charcot-Marie-Tooth disease Type IA, Dejerine-Sottas syndrome, and hereditary neuropathy with liability to pressure palsies. Alternative splicing results in multiple transcript variants. 5376 NA
NA ENSG00000140181 NA NA NA TRUE
albumin ENSG00000163631 ALB Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. 213 NA
NA ENSG00000259716 NA NA NA TRUE
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",17,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 18 Annotations

out <- mygene::queryMany(gene_list[18,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary X_id query symbol name notfound
The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. 3043 ENSG00000244734 HBB hemoglobin subunit beta NA
The protein encoded by this gene belongs to the glutamine synthetase family. It catalyzes the synthesis of glutamine from glutamate and ammonia in an ATP-dependent reaction. This protein plays a role in ammonia and glutamate detoxification, acid-base homeostasis, cell signaling, and cell proliferation. Glutamine is an abundant amino acid, and is important to the biosynthesis of several amino acids, pyrimidines, and purines. Mutations in this gene are associated with congenital glutamine deficiency, and overexpression of this gene was observed in some primary liver cancer samples. There are six pseudogenes of this gene found on chromosomes 2, 5, 9, 11, and 12. Alternative splicing results in multiple transcript variants. 2752 ENSG00000135821 GLUL glutamate-ammonia ligase NA
This gene product belongs to the glutathione peroxidase family, which functions in the detoxification of hydrogen peroxide. It contains a selenocysteine (Sec) residue at its active site. The selenocysteine is encoded by the UGA codon, which normally signals translation termination. The 3’ UTR of Sec-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), which is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. 2878 ENSG00000211445 GPX3 glutathione peroxidase 3 NA
The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. 3040 ENSG00000188536 HBA2 hemoglobin subunit alpha 2 NA
This gene encodes the anterior pituitary hormone prolactin. This secreted hormone is a growth regulator for many tissues, including cells of the immune system. It may also play a role in cell survival by suppressing apoptosis, and it is essential for lactation. Alternative splicing results in multiple transcript variants that encode the same protein. 5617 ENSG00000172179 PRL prolactin NA
NA 79026 ENSG00000124942 AHNAK AHNAK nucleoprotein NA
LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. 4023 ENSG00000175445 LPL lipoprotein lipase NA
The protein encoded by this gene belongs to the family of latent transforming growth factor (TGF)-beta binding proteins (LTBP), which are extracellular matrix proteins with multi-domain structure. This protein is the largest member of the LTBP family possessing unique regions and with most similarity to the fibrillins. It has thus been suggested that it may have multiple functions: as a member of the TGF-beta latent complex, as a structural component of microfibrils, and a role in cell adhesion. 4053 ENSG00000119681 LTBP2 latent transforming growth factor beta binding protein 2 NA
The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. 2194 ENSG00000169710 FASN fatty acid synthase NA
This gene encodes a glycoprotein involved in hemostasis. The encoded preproprotein is proteolytically processed following assembly into large multimeric complexes. These complexes function in the adhesion of platelets to sites of vascular injury and the transport of various proteins in the blood. Mutations in this gene result in von Willebrand disease, an inherited bleeding disorder. An unprocessed pseudogene has been found on chromosome 22. 7450 ENSG00000110799 VWF von Willebrand factor NA
The protein encoded by this gene is a member of the somatotropin/prolactin family of hormones which play an important role in growth control. The gene, along with four other related genes, is located at the growth hormone locus on chromosome 17 where they are interspersed in the same transcriptional orientation; an arrangement which is thought to have evolved by a series of gene duplications. The five genes share a remarkably high degree of sequence identity. Alternative splicing generates additional isoforms of each of the five growth hormones, leading to further diversity and potential for specialization. This particular family member is expressed in the pituitary but not in placental tissue as is the case for the other four genes in the growth hormone locus. Mutations in or deletions of the gene lead to growth hormone deficiency and short stature. 2688 ENSG00000259384 GH1 growth hormone 1 NA
Acetyl-CoA carboxylase (ACC) is a complex multifunctional enzyme system. ACC is a biotin-containing enzyme which catalyzes the carboxylation of acetyl-CoA to malonyl-CoA, the rate-limiting step in fatty acid synthesis. ACC-beta is thought to control fatty acid oxidation by means of the ability of malonyl-CoA to inhibit carnitine-palmitoyl-CoA transferase I, the rate-limiting step in fatty acid uptake and oxidation by mitochondria. ACC-beta may be involved in the regulation of fatty acid oxidation, rather than fatty acid biosynthesis. There is evidence for the presence of two ACC-beta isoforms. 32 ENSG00000076555 ACACB acetyl-CoA carboxylase beta NA
NA NA ENSG00000117289 NA NA TRUE
Members of the perilipin family, such as PLIN4, coat intracellular lipid storage droplets (Wolins et al., 2003 [PubMed 12840023]). 729359 ENSG00000167676 PLIN4 perilipin 4 NA
This locus has a highly complex imprinted expression pattern. It gives rise to maternally, paternally, and biallelically expressed transcripts that are derived from four alternative promoters and 5’ exons. Some transcripts contain a differentially methylated region (DMR) at their 5’ exons, and this DMR is commonly found in imprinted genes and correlates with transcript expression. An antisense transcript is produced from an overlapping locus on the opposite strand. One of the transcripts produced from this locus, and the antisense transcript, are paternally expressed noncoding RNAs, and may regulate imprinting in this region. In addition, one of the transcripts contains a second overlapping ORF, which encodes a structurally unrelated protein - Alex. Alternative splicing of downstream exons is also observed, which results in different forms of the stimulatory G-protein alpha subunit, a key element of the classical signal transduction pathway linking receptor-ligand interactions with the activation of adenylyl cyclase and a variety of cellular reponses. Multiple transcript variants encoding different isoforms have been found for this gene. Mutations in this gene result in pseudohypoparathyroidism type 1a, pseudohypoparathyroidism type 1b, Albright hereditary osteodystrophy, pseudopseudohypoparathyroidism, McCune-Albright syndrome, progressive osseus heteroplasia, polyostotic fibrous dysplasia of bone, and some pituitary tumors. 2778 ENSG00000087460 GNAS GNAS complex locus NA
The protein encoded by this gene is a mechanically-activated ion channel that links mechanical forces to biological signals. The encoded protein contains 36 transmembrane domains and functions as a homotetramer. Defects in this gene have been associated with dehydrated hereditary stomatocytosis. 9780 ENSG00000103335 PIEZO1 piezo type mechanosensitive ion channel component 1 NA
Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. Several isoforms of each chain have been described. Different alpha, beta and gamma chain isomers combine to give rise to different heterotrimeric laminin isoforms which are designated by Arabic numerals in the order of their discovery, i.e. alpha1beta1gamma1 heterotrimer is laminin 1. The biological functions of the different chains and trimer molecules are largely unknown, but some of the chains have been shown to differ with respect to their tissue distribution, presumably reflecting diverse functions in vivo. This gene encodes the beta chain isoform laminin, beta 1. The beta 1 chain has 7 structurally distinct domains which it shares with other beta chain isomers. The C-terminal helical region containing domains I and II are separated by domain alpha, domains III and V contain several EGF-like repeats, and domains IV and VI have a globular conformation. Laminin, beta 1 is expressed in most tissues that produce basement membranes, and is one of the 3 chains constituting laminin 1, the first laminin isolated from Engelbreth-Holm-Swarm (EHS) tumor. A sequence in the beta 1 chain that is involved in cell attachment, chemotaxis, and binding to the laminin receptor was identified and shown to have the capacity to inhibit metastasis. 3912 ENSG00000091136 LAMB1 laminin subunit beta 1 NA
This gene encodes a preproprotein that undergoes extensive, tissue-specific, post-translational processing via cleavage by subtilisin-like enzymes known as prohormone convertases. There are eight potential cleavage sites within the preproprotein and, depending on tissue type and the available convertases, processing may yield as many as ten biologically active peptides involved in diverse cellular functions. The encoded protein is synthesized mainly in corticotroph cells of the anterior pituitary where four cleavage sites are used; adrenocorticotrophin, essential for normal steroidogenesis and the maintenance of normal adrenal weight, and lipotropin beta are the major end products. In other tissues, including the hypothalamus, placenta, and epithelium, all cleavage sites may be used, giving rise to peptides with roles in pain and energy homeostasis, melanocyte stimulation, and immune modulation. These include several distinct melanotropins, lipotropins, and endorphins that are contained within the adrenocorticotrophin and beta-lipotropin peptides. The antimicrobial melanotropin alpha peptide exhibits antibacterial and antifungal activity. Mutations in this gene have been associated with early onset obesity, adrenal insufficiency, and red hair pigmentation. Alternatively spliced transcript variants encoding the same protein have been described. 5443 ENSG00000115138 POMC proopiomelanocortin NA
NA ENSG00000251322 ENSG00000251322 SHANK3 SH3 and multiple ankyrin repeat domains 3 NA
The protein encoded by this gene is a member of the scavenger receptor cysteine-rich (SRCR) superfamily, and is exclusively expressed in monocytes and macrophages. It functions as an acute phase-regulated receptor involved in the clearance and endocytosis of hemoglobin/haptoglobin complexes by macrophages, and may thereby protect tissues from free hemoglobin-mediated oxidative damage. This protein may also function as an innate immune sensor for bacteria and inducer of local inflammation. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. 9332 ENSG00000177575 CD163 CD163 molecule NA
FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. 2167 ENSG00000170323 FABP4 fatty acid binding protein 4 NA
The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. 3039 ENSG00000206172 HBA1 hemoglobin subunit alpha 1 NA
This gene encodes a member of the nidogen family of basement membrane glycoproteins. The protein interacts with several other components of basement membranes, and may play a role in cell interactions with the extracellular matrix. 4811 ENSG00000116962 NID1 nidogen 1 NA
This locus may represent a breast cancer candidate gene. It is located close to FGFR1 on a region of chromosome 8 that is amplified in some breast cancers. Three transcript variants encoding different isoforms have been found for this gene. 6867 ENSG00000147526 TACC1 transforming acidic coiled-coil containing protein 1 NA
This gene belongs to the TIMP gene family. The proteins encoded by this gene family are inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix (ECM). Expression of this gene is induced in response to mitogenic stimulation and this netrin domain-containing protein is localized to the ECM. Mutations in this gene have been associated with the autosomal dominant disorder Sorsby’s fundus dystrophy. 7078 ENSG00000100234 TIMP3 TIMP metallopeptidase inhibitor 3 NA
NA ENSG00000225630 ENSG00000225630 MTND2P28 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 2 pseudogene 28 NA
This gene encodes a member of the insulin-like growth factor (IGF)-binding protein (IGFBP) family. IGFBPs bind IGFs with high affinity, and regulate IGF availability in body fluids and tissues and modulate IGF binding to its receptors. This protein binds IGF-I and IGF-II with relatively low affinity, and belongs to a subfamily of low-affinity IGFBPs. It also stimulates prostacyclin production and cell adhesion. Alternatively spliced transcript variants encoding different isoforms have been described for this gene, and one variant has been associated with retinal arterial macroaneurysm (PMID:21835307). 3490 ENSG00000163453 IGFBP7 insulin like growth factor binding protein 7 NA
This gene encodes one of the two alpha chains of type VIII collagen. The gene product is a short chain collagen and a major component of the basement membrane of the corneal endothelium. The type VIII collagen fibril can be either a homo- or a heterotrimer. Alternatively spliced transcript variants encoding the same protein have been observed. 1295 ENSG00000144810 COL8A1 collagen type VIII alpha 1 NA
This gene encodes a large, transmembrane receptor protein which may function in angiogenesis, lymphocyte homing, cell adhesion, or receptor scavenging. The protein contains 7 fasciclin, 16 epidermal growth factor (EGF)-like, and 2 laminin-type EGF-like domains as well as a C-type lectin-like hyaluronan-binding Link module. The protein is primarily expressed on sinusoidal endothelial cells of liver, spleen, and lymph node. The receptor has been shown to endocytose ligands such as low density lipoprotein, Gram-positive and Gram-negative bacteria, and advanced glycosylation end products. Supporting its possible role as a scavenger receptor, the protein rapidly cycles between the plasma membrane and early endosomes. 23166 ENSG00000010327 STAB1 stabilin 1 NA
This gene encodes a major constituent of the human complement subcomponent C1q. C1q associates with C1r and C1s in order to yield the first component of the serum complement system. Deficiency of C1q has been associated with lupus erythematosus and glomerulonephritis. C1q is composed of 18 polypeptide chains: six A-chains, six B-chains, and six C-chains. Each chain contains a collagen-like region located near the N terminus and a C-terminal globular region. The A-, B-, and C-chains are arranged in the order A-C-B on chromosome 1. This gene encodes the B-chain polypeptide of human complement subcomponent C1q 713 ENSG00000173369 C1QB complement component 1, q subcomponent, B chain NA
This gene encodes the beta subunit of prolyl 4-hydroxylase, a highly abundant multifunctional enzyme that belongs to the protein disulfide isomerase family. When present as a tetramer consisting of two alpha and two beta subunits, this enzyme is involved in hydroxylation of prolyl residues in preprocollagen. This enzyme is also a disulfide isomerase containing two thioredoxin domains that catalyze the formation, breakage and rearrangement of disulfide bonds. Other known functions include its ability to act as a chaperone that inhibits aggregation of misfolded proteins in a concentration-dependent manner, its ability to bind thyroid hormone, its role in both the influx and efflux of S-nitrosothiol-bound nitric oxide, and its function as a subunit of the microsomal triglyceride transfer protein complex. 5034 ENSG00000185624 P4HB prolyl 4-hydroxylase subunit beta NA
This gene encodes a member of the fibulin family of extracellular matrix glycoproteins. Like all members of this family, the encoded protein contains tandemly repeated epidermal growth factor-like repeats followed by a C-terminus fibulin-type domain. This gene is upregulated in malignant gliomas and may play a role in the aggressive nature of these tumors. Mutations in this gene are associated with Doyne honeycomb retinal dystrophy. Alternatively spliced transcript variants that encode the same protein have been described. 2202 ENSG00000115380 EFEMP1 EGF containing fibulin like extracellular matrix protein 1 NA
This gene encodes a tyrosine-sulfated secretory protein abundant in peptidergic endocrine cells and neurons. This protein may serve as a precursor for regulatory peptides. 1114 ENSG00000089199 CHGB chromogranin B NA
Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. 10398 ENSG00000101335 MYL9 myosin light chain 9 NA
The protein encoded by this gene is the fourth major glycoprotein of the platelet surface and serves as a receptor for thrombospondin in platelets and various cell lines. Since thrombospondins are widely distributed proteins involved in a variety of adhesive processes, this protein may have important functions as a cell adhesion molecule. It binds to collagen, thrombospondin, anionic phospholipids and oxidized LDL. It directly mediates cytoadherence of Plasmodium falciparum parasitized erythrocytes and it binds long chain fatty acids and may function in the transport and/or as a regulator of fatty acid transport. Mutations in this gene cause platelet glycoprotein deficiency. Multiple alternatively spliced transcript variants have been found for this gene. 948 ENSG00000135218 CD36 CD36 molecule NA
The protein encoded by this gene associates with class II major histocompatibility complex (MHC) and is an important chaperone that regulates antigen presentation for immune response. It also serves as cell surface receptor for the cytokine macrophage migration inhibitory factor (MIF) which, when bound to the encoded protein, initiates survival pathways and cell proliferation. This protein also interacts with amyloid precursor protein (APP) and suppresses the production of amyloid beta (Abeta). Multiple alternatively spliced transcript variants encoding different isoforms have been identified. 972 ENSG00000019582 CD74 CD74 molecule NA
This gene encodes a member of the claudin family. Claudins are integral membrane proteins and components of tight junction strands. Tight junction strands serve as a physical barrier to prevent solutes and water from passing freely through the paracellular space between epithelial or endothelial cell sheets. Mutations in this gene have been found in patients with velocardiofacial syndrome. Alternatively spliced transcript variants encoding the same protein have been found for this gene. 7122 ENSG00000184113 CLDN5 claudin 5 NA
Spectrin is an actin crosslinking and molecular scaffold protein that links the plasma membrane to the actin cytoskeleton, and functions in the determination of cell shape, arrangement of transmembrane proteins, and organization of organelles. It is composed of two antiparallel dimers of alpha- and beta- subunits. This gene is one member of a family of beta-spectrin genes. The encoded protein contains an N-terminal actin-binding domain, and 17 spectrin repeats which are involved in dimer formation. Multiple transcript variants encoding different isoforms have been found for this gene. 6711 ENSG00000115306 SPTBN1 spectrin beta, non-erythrocytic 1 NA
This gene is a member of the immunoglobulin superfamily. The encoded poly-Ig receptor binds polymeric immunoglobulin molecules at the basolateral surface of epithelial cells; the complex is then transported across the cell to be secreted at the apical surface. A significant association was found between immunoglobulin A nephropathy and several SNPs in this gene. 5284 ENSG00000162896 PIGR polymeric immunoglobulin receptor NA
The protein encoded by this gene is a member of the phospholipase A2 family (PLA2). PLA2s constitute a diverse family of enzymes with respect to sequence, function, localization, and divalent cation requirements. This gene product belongs to group II, which contains secreted form of PLA2, an extracellular enzyme that has a low molecular mass and requires calcium ions for catalysis. It catalyzes the hydrolysis of the sn-2 fatty acid acyl ester bond of phosphoglycerides, releasing free fatty acids and lysophospholipids, and thought to participate in the regulation of the phospholipid metabolism in biomembranes. Several alternatively spliced transcript variants with different 5’ UTRs have been found for this gene. 5320 ENSG00000188257 PLA2G2A phospholipase A2 group IIA NA
Amyloid precursor proteins are processed by beta-secretase and gamma-secretase to produce beta-amyloid peptides which form the characteristic plaques of Alzheimer disease. This gene encodes a transmembrane protein which is processed at the C-terminus by furin or furin-like proteases to produce a small secreted peptide which inhibits the deposition of beta-amyloid. Mutations which result in extension of the C-terminal end of the encoded protein, thereby increasing the size of the secreted peptide, are associated with two neurogenerative diseases, familial British dementia and familial Danish dementia. 9445 ENSG00000136156 ITM2B integral membrane protein 2B NA
The protein encoded by this gene is a glycosylated membrane protein and a non-specific receptor for several chemokines. The encoded protein is the receptor for the human malarial parasites Plasmodium vivax and Plasmodium knowlesi. Polymorphisms in this gene are the basis of the Duffy blood group system. Two transcript variants encoding different isoforms have been found for this gene. 2532 ENSG00000213088 ACKR1 atypical chemokine receptor 1 (Duffy blood group) NA
This gene encodes a major constituent of the human complement subcomponent C1q. C1q associates with C1r and C1s in order to yield the first component of the serum complement system. A deficiency in C1q has been associated with lupus erythematosus and glomerulonephritis. C1q is composed of 18 polypeptide chains: six A-chains, six B-chains, and six C-chains. Each chain contains a collagen-like region located near the N-terminus, and a C-terminal globular region. The A-, B-, and C-chains are arranged in the order A-C-B on chromosome 1. This gene encodes the C-chain polypeptide of human complement subcomponent C1q. Alternatively spliced transcript variants that encode the same protein have been found for this gene. 714 ENSG00000159189 C1QC complement component 1, q subcomponent, C chain NA
This gene is a member of the aggrecan/versican proteoglycan family. The protein encoded is a large chondroitin sulfate proteoglycan and is a major component of the extracellular matrix. This protein is involved in cell adhesion, proliferation, proliferation, migration and angiogenesis and plays a central role in tissue morphogenesis and maintenance. Mutations in this gene are the cause of Wagner syndrome type 1. Multiple transcript variants encoding different isoforms have been found for this gene. 1462 ENSG00000038427 VCAN versican NA
Fibromodulin belongs to the family of small interstitial proteoglycans. The encoded protein possesses a central region containing leucine-rich repeats with 4 keratan sulfate chains, flanked by terminal domains containing disulphide bonds. Owing to the interaction with type I and type II collagen fibrils and in vitro inhibition of fibrillogenesis, the encoded protein may play a role in the assembly of extracellular matrix. It may also regulate TGF-beta activities by sequestering TGF-beta into the extracellular matrix. Sequence variations in this gene may be associated with the pathogenesis of high myopia. Alternative splicing results in multiple transcript variants. 2331 ENSG00000122176 FMOD fibromodulin NA
The protein encoded by this gene belongs to the family of latent TGF-beta binding proteins (LTBPs). The secretion and activation of TGF-betas is regulated by their association with latency-associated proteins and with latent TGF-beta binding proteins. The product of this gene targets latent complexes of transforming growth factor beta to the extracellular matrix, where the latent cytokine is subsequently activated by several different mechanisms. Alternatively spliced transcript variants encoding different isoforms have been identified. 4052 ENSG00000049323 LTBP1 latent transforming growth factor beta binding protein 1 NA
This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. 8490 ENSG00000143248 RGS5 regulator of G-protein signaling 5 NA
NA 4642 ENSG00000176658 MYO1D myosin ID NA
NA ENSG00000211890 ENSG00000211890 IGHA2 immunoglobulin heavy constant alpha 2 (A2m marker) NA
This gene encodes a member of the unconventional myosin protein family, which are actin-based molecular motors. The protein is found in the cytoplasm, and one isoform with a unique N-terminus is also found in the nucleus. The nuclear isoform associates with RNA polymerase I and II and functions in transcription initiation. The mouse ortholog of this protein also functions in intracellular vesicle transport to the plasma membrane. Multiple transcript variants encoding different isoforms have been found for this gene. The related gene myosin IE has been referred to as myosin IC in the literature, but it is a distinct locus on chromosome 19. 4641 ENSG00000197879 MYO1C myosin IC NA
The protein encoded by this protein regulates inositol phosphate metabolism by phosphorylation of second messenger inositol 1,4,5-trisphosphate to Ins(1,3,4,5)P4. The activity of this encoded protein is responsible for regulating the levels of a large number of inositol polyphosphates that are important in cellular signaling. Both calcium/calmodulin and protein phosphorylation mechanisms control its activity. 3707 ENSG00000143772 ITPKB inositol-trisphosphate 3-kinase B NA
This gene encodes a member of the M14 family of metallocarboxypeptidases. The encoded preproprotein is proteolytically processed to generate the mature peptidase. This peripheral membrane protein cleaves C-terminal amino acid residues and is involved in the biosynthesis of peptide hormones and neurotransmitters, including insulin. This protein may also function independently of its peptidase activity, as a neurotrophic factor that promotes neuronal survival, and as a sorting receptor that binds to regulated secretory pathway proteins, including prohormones. Mutations in this gene are implicated in type 2 diabetes. 1363 ENSG00000109472 CPE carboxypeptidase E NA
The protein encoded by this gene coats lipid storage droplets in adipocytes, thereby protecting them until they can be broken down by hormone-sensitive lipase. The encoded protein is the major cAMP-dependent protein kinase substrate in adipocytes and, when unphosphorylated, may play a role in the inhibition of lipolysis. Alternatively spliced transcript variants varying in the 5’ UTR, but encoding the same protein, have been found for this gene. 5346 ENSG00000166819 PLIN1 perilipin 1 NA
This gene encodes a major constituent of the human complement subcomponent C1q. C1q associates with C1r and C1s in order to yield the first component of the serum complement system. Deficiency of C1q has been associated with lupus erythematosus and glomerulonephritis. C1q is composed of 18 polypeptide chains: six A-chains, six B-chains, and six C-chains. Each chain contains a collagen-like region located near the N terminus and a C-terminal globular region. The A-, B-, and C-chains are arranged in the order A-C-B on chromosome 1. This gene encodes the A-chain polypeptide of human complement subcomponent C1q. 712 ENSG00000173372 C1QA complement component 1, q subcomponent, A chain NA
This gene encodes a member of the myosin superfamily. The protein represents a conventional non-muscle myosin; it should not be confused with the unconventional myosin-10 (MYO10). Myosins are actin-dependent motor proteins with diverse functions including regulation of cytokinesis, cell motility, and cell polarity. Mutations in this gene have been associated with May-Hegglin anomaly and developmental defects in brain and heart. Multiple transcript variants encoding different isoforms have been found for this gene. 4628 ENSG00000133026 MYH10 myosin, heavy chain 10, non-muscle NA
This gene encodes a member of carboxypeptidase A protein family. The encoded protein may function as a transcriptional repressor and play a role in adipogenesis and smooth muscle cell differentiation. Studies in mice suggest that this gene functions in wound healing and abdominal wall development. Overexpression of this gene is associated with glioblastoma. 165 ENSG00000106624 AEBP1 AE binding protein 1 NA
This gene encodes an integral membrane protein that is a major component of myelin in the peripheral nervous system. Studies suggest two alternately used promoters drive tissue-specific expression. Various mutations of this gene are causes of Charcot-Marie-Tooth disease Type IA, Dejerine-Sottas syndrome, and hereditary neuropathy with liability to pressure palsies. Alternative splicing results in multiple transcript variants. 5376 ENSG00000109099 PMP22 peripheral myelin protein 22 NA
The protein encoded by this gene has a long and a short form, generated by use of alternative translational start codons. The long form is expressed in steroidogenic tissues such as testis, where it converts cholesteryl esters to free cholesterol for steroid hormone production. The short form is expressed in adipose tissue, among others, where it hydrolyzes stored triglycerides to free fatty acids. 3991 ENSG00000079435 LIPE lipase E, hormone sensitive type NA
Major alterations in the composition of the cartilage extracellular matrix occur in joint disease, such as osteoarthrosis. This gene encodes the cartilage intermediate layer protein (CILP), which increases in early osteoarthrosis cartilage. The encoded protein was thought to encode a protein precursor for two different proteins; an N-terminal CILP and a C-terminal homolog of NTPPHase, however, later studies identified no nucleotide pyrophosphatase phosphodiesterase (NPP) activity. The full-length and the N-terminal domain of this protein was shown to function as an IGF-1 antagonist. An allelic variant of this gene has been associated with lumbar disc disease. 8483 ENSG00000138615 CILP cartilage intermediate layer protein NA
The protein encoded by this gene is a small secreted cysteine-rich protein and a member of the CCN family of regulatory proteins. CNN family proteins associate with the extracellular matrix and play an important role in cardiovascular and skeletal development, fibrosis and cancer development. 4856 ENSG00000136999 NOV nephroblastoma overexpressed NA
NA 27254 ENSG00000172346 CSDC2 cold shock domain containing C2 NA
This gene encodes a preproprotein that is proteolytically processed to form multiple protein products. The major encoded protein product, lactadherin, is a membrane glycoprotein that promotes phagocytosis of apoptotic cells. This protein has also been implicated in wound healing, autoimmune disease, and cancer. Lactadherin can be further processed to form a smaller cleavage product, medin, which comprises the major protein component of aortic medial amyloid (AMA). Alternative splicing results in multiple transcript variants. 4240 ENSG00000140545 MFGE8 milk fat globule-EGF factor 8 protein NA
This gene encodes a large protein that contains six ankyrin repeats, as well as a Src homology 3 (SH3) domain and two sterile alpha motif (SAM) domains, which may be involved in protein-protein interactions. The C-terminal portion of this protein is proline-rich and contains a conserved region. A related protein interacts with calcium/calmodulin-dependent serine protein kinase (CASK). Alternative splicing results in multiple transcript variants. 57513 ENSG00000177303 CASKIN2 CASK interacting protein 2 NA
This gene encodes a type I cell-surface receptor for the TGF-beta superfamily of ligands. It shares with other type I receptors a high degree of similarity in serine-threonine kinase subdomains, a glycine- and serine-rich region (called the GS domain) preceding the kinase domain, and a short C-terminal tail. The encoded protein, sometimes termed ALK1, shares similar domain structures with other closely related ALK or activin receptor-like kinase proteins that form a subfamily of receptor serine/threonine kinases. Mutations in this gene are associated with hemorrhagic telangiectasia type 2, also known as Rendu-Osler-Weber syndrome 2. 94 ENSG00000139567 ACVRL1 activin A receptor like type 1 NA
This gene encodes an alpha chain for one of the low abundance fibrillar collagens. Fibrillar collagen molecules are trimers that can be composed of one or more types of alpha chains. Type V collagen is found in tissues containing type I collagen and appears to regulate the assembly of heterotypic fibers composed of both type I and type V collagen. This gene product is closely related to type XI collagen and it is possible that the collagen chains of types V and XI constitute a single collagen type with tissue-specific chain combinations. Mutations in this gene are thought to be responsible for the symptoms of a subset of patients with Ehlers-Danlos syndrome type III. Messages of several sizes can be detected in northern blots but sequence information cannot confirm the identity of the shorter messages. 50509 ENSG00000080573 COL5A3 collagen type V alpha 3 NA
This gene encodes a secreted endothelial cell protein that contains two epidermal growth factor-like domains. The encoded protein may play a role in regulating vasculogenesis. This protein may be involved in the growth and proliferation of tumor cells. Alternate splicing results in multiple transcript variants. 51162 ENSG00000172889 EGFL7 EGF like domain multiple 7 NA
This gene may play a role in regulation of the innate immune response. The encoded protein is upregulated in response to viral infection and may be involved in mediation of tumor necrosis factor-alpha proinflammatory responses. Mutations in this gene have been associated with Aicardi-Goutieres syndrome. 25939 ENSG00000101347 SAMHD1 SAM and HD domain containing deoxynucleoside triphosphate triphosphohydrolase 1 NA
This gene encodes an alpha integrin. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain. This protein contains an I domain, is expressed in muscle tissue, dimerizes with beta 1 integrin in vitro, and appears to bind collagen in this form. Therefore, the protein may be involved in attaching muscle tissue to the extracellular matrix. Alternative transcriptional splice variants have been found for this gene, but their biological validity is not determined. 22801 ENSG00000137809 ITGA11 integrin subunit alpha 11 NA
The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. The use of alternate polyadenylation sites has been found for this gene. 23555 ENSG00000099282 TSPAN15 tetraspanin 15 NA
This gene is a member of the apolipoprotein L gene family, and it is present in a cluster with other family members on chromosome 22. The encoded protein is found in the cytoplasm, where it may affect the movement of lipids, including cholesterol, and/or allow the binding of lipids to organelles. In addition, expression of this gene is up-regulated by tumor necrosis factor-alpha in endothelial cells lining the normal and atherosclerotic iliac artery and aorta. Alternative splicing results in multiple transcript variants. 80833 ENSG00000128284 APOL3 apolipoprotein L3 NA
Tryptases comprise a family of trypsin-like serine proteases, the peptidase family S1. Tryptases are enzymatically active only as heparin-stabilized tetramers, and they are resistant to all known endogenous proteinase inhibitors. Several tryptase genes are clustered on chromosome 16p13.3. These genes are characterized by several distinct features. They have a highly conserved 3’ UTR and contain tandem repeat sequences at the 5’ flank and 3’ UTR which are thought to play a role in regulation of the mRNA stability. These genes have an intron immediately upstream of the initiator Met codon, which separates the site of transcription initiation from protein coding sequence. This feature is characteristic of tryptases but is unusual in other genes. The alleles of this gene exhibit an unusual amount of sequence variation, such that the alleles were once thought to represent two separate genes, alpha and beta 1. Beta tryptases appear to be the main isoenzymes expressed in mast cells; whereas in basophils, alpha tryptases predominate. Tryptases have been implicated as mediators in the pathogenesis of asthma and other allergic and inflammatory disorders. 7177 ENSG00000172236 TPSAB1 tryptase alpha/beta 1 NA
This gene encodes a member of the semicarbazide-sensitive amine oxidase family. Copper amine oxidases catalyze the oxidative conversion of amines to aldehydes in the presence of copper and quinone cofactor. The encoded protein is localized to the cell surface, has adhesive properties as well as monoamine oxidase activity, and may be involved in leukocyte trafficking. Alterations in levels of the encoded protein may be associated with many diseases, including diabetes mellitus. A pseudogene of this gene has been described and is located approximately 9-kb downstream on the same chromosome. Alternative splicing results in multiple transcript variants. 8639 ENSG00000131471 AOC3 amine oxidase, copper containing 3 NA
This gene encodes an enzyme involved in fatty acid biosynthesis, primarily the synthesis of oleic acid. The protein belongs to the fatty acid desaturase family and is an integral membrane protein located in the endoplasmic reticulum. Transcripts of approximately 3.9 and 5.2 kb, differing only by alternative polyadenlyation signals, have been detected. A gene encoding a similar enzyme is located on chromosome 4 and a pseudogene of this gene is located on chromosome 17. 6319 ENSG00000099194 SCD stearoyl-CoA desaturase NA
The protein encoded by this gene belongs to the thrombospondin protein family. Thrombospondin family members are adhesive glycoproteins that mediate cell-to-cell and cell-to-matrix interactions. This protein forms a pentamer and can bind to heparin and calcium. It is involved in local signaling in the developing and adult nervous system, and it contributes to spinal sensitization and neuropathic pain states. This gene is activated during the stromal response to invasive breast cancer. It may also play a role in inflammatory responses in Alzheimer’s disease. Alternative splicing results in multiple transcript variants. 7060 ENSG00000113296 THBS4 thrombospondin 4 NA
NA NA ENSG00000259716 NA NA TRUE
The protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family. PTPs are known to be signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation. This PTP contains an extracellular domain, a single transmembrane segment and one intracytoplasmic catalytic domain, thus belongs to receptor type PTP. The extracellular region of this PTP is composed of multiple fibronectin type_III repeats, which was shown to interact with neuronal receptor and cell adhesion molecules, such as contactin and tenascin C. This protein was also found to interact with sodium channels, and thus may regulate sodium channels by altering tyrosine phosphorylation status. The functions of the interaction partners of this protein implicate the roles of this PTP in cell adhesion, neurite growth, and neuronal differentiation. Alternate transcript variants encoding different isoforms have been found for this gene. 5787 ENSG00000127329 PTPRB protein tyrosine phosphatase, receptor type B NA
This gene encodes a member of the cell death-inducing DNA fragmentation factor-like effector family. Members of this family play important roles in apoptosis. The encoded protein promotes lipid droplet formation in adipocytes and may mediate adipocyte apoptosis. This gene is regulated by insulin and its expression is positively correlated with insulin sensitivity. Mutations in this gene may contribute to insulin resistant diabetes. A pseudogene of this gene is located on the short arm of chromosome 3. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. 63924 ENSG00000187288 CIDEC cell death inducing DFFA like effector c NA
Phosphatidylinositol 3-kinase phosphorylates the inositol ring of phosphatidylinositol at the 3-prime position. The enzyme comprises a 110 kD catalytic subunit and a regulatory subunit of either 85, 55, or 50 kD. This gene encodes the 85 kD regulatory subunit. Phosphatidylinositol 3-kinase plays an important role in the metabolic actions of insulin, and a mutation in this gene has been associated with insulin resistance. Alternative splicing of this gene results in four transcript variants encoding different isoforms. 5295 ENSG00000145675 PIK3R1 phosphoinositide-3-kinase regulatory subunit 1 NA
NA 8503 ENSG00000117461 PIK3R3 phosphoinositide-3-kinase regulatory subunit 3 NA
This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. 7431 ENSG00000026025 VIM vimentin NA
NA ENSG00000260121 ENSG00000260121 RP5-1142A6.9 NA NA
NA 55228 ENSG00000182013 PNMAL1 paraneoplastic Ma antigen family-like 1 NA
NA 3726 ENSG00000171223 JUNB JunB proto-oncogene, AP-1 transcription factor subunit NA
This gene encodes a weak acid-active hyaluronidase. The encoded protein is similar in structure to other more active hyaluronidases. Hyaluronidases degrade hyaluronan, one of the major glycosaminoglycans of the extracellular matrix. Hyaluronan and fragments of hyaluronan are thought to be involved in cell proliferation, migration and differentiation. Although it was previously thought to be a lysosomal hyaluronidase that is active at a pH below 4, the encoded protein is likely a GPI-anchored cell surface protein. This hyaluronidase serves as a receptor for the oncogenic virus Jaagsiekte sheep retrovirus. The gene is one of several related genes in a region of chromosome 3p21.3 associated with tumor suppression. This gene encodes two alternatively spliced transcript variants which differ only in the 5’ UTR. 8692 ENSG00000068001 HYAL2 hyaluronoglucosaminidase 2 NA
The protein encoded by this gene is structurally similar to G protein-coupled receptors and is highly expressed in endothelial cells. It binds the ligand sphingosine-1-phosphate with high affinity and high specificity, and suggested to be involved in the processes that regulate the differentiation of endothelial cells. Activation of this receptor induces cell-cell adhesion. Alternative splicing results in multiple transcript variants. 1901 ENSG00000170989 S1PR1 sphingosine-1-phosphate receptor 1 NA
The protein encoded by this gene is a membrane-bound arginine/lysine carboxypeptidase. Its expression is associated with monocyte to macrophage differentiation. This encoded protein contains hydrophobic regions at the amino and carboxy termini and has 6 potential asparagine-linked glycosylation sites. The active site residues of carboxypeptidases A and B are conserved in this protein. Three alternatively spliced transcript variants encoding the same protein have been described for this gene. 1368 ENSG00000135678 CPM carboxypeptidase M NA
Members of the CELF/BRUNOL protein family contain two N-terminal RNA recognition motif (RRM) domains, one C-terminal RRM domain, and a divergent segment of 160-230 aa between the second and third RRM domains. Members of this protein family regulate pre-mRNA alternative splicing and may also be involved in mRNA editing, and translation. Alternative splicing results in multiple transcript variants encoding different isoforms. 10659 ENSG00000048740 CELF2 CUGBP, Elav-like family member 2 NA
The protein encoded by this gene is a member of the immunophilin protein family, which play a role in immunoregulation and basic cellular processes involving protein folding and trafficking. Unlike the other members of the family, this encoded protein does not seem to have PPIase/rotamase activity. It may have a role in neurons associated with memory function. 23770 ENSG00000105701 FKBP8 FK506 binding protein 8 NA
NA NA ENSG00000256545 NA NA TRUE
Epidermodysplasia verruciformis (EV) is an autosomal recessive dermatosis characterized by abnormal susceptibility to human papillomaviruses (HPVs) and a high rate of progression to squamous cell carcinoma on sun-exposed skin. EV is caused by mutations in either of two adjacent genes located on chromosome 17q25.3. Both of these genes encode integral membrane proteins that localize to the endoplasmic reticulum and are predicted to form transmembrane channels. This gene encodes a transmembrane channel-like protein with 10 transmembrane domains and 2 leucine zipper motifs. 11322 ENSG00000141524 TMC6 transmembrane channel like 6 NA
This gene encodes a serine hydrolase of the AB hydrolase superfamily that catalyzes the conversion of monoacylglycerides to free fatty acids and glycerol. The encoded protein plays a critical role in several physiological processes including pain and nociperception through hydrolysis of the endocannabinoid 2-arachidonoylglycerol. Expression of this gene may play a role in cancer tumorigenesis and metastasis. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 11343 ENSG00000074416 MGLL monoglyceride lipase NA
This gene encodes a member of the tyrosine protein kinase family. The encoded protein plays a critical role in angiogenesis and blood vessel stability by inhibiting angiopoietin 1 signaling through the endothelial receptor tyrosine kinase Tie2. Ectodomain cleavage of the encoded protein relieves inhibition of Tie2 and is mediated by multiple factors including vascular endothelial growth factor. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 7075 ENSG00000066056 TIE1 tyrosine kinase with immunoglobulin like and EGF like domains 1 NA
This gene encodes a member of the S1, or chymotrypsin, family of serine peptidases. This protease catalyzes the cleavage of factor B, the rate-limiting step of the alternative pathway of complement activation. This protein also functions as an adipokine, a cell signaling protein secreted by adipocytes, which regulates insulin secretion in mice. Mutations in this gene underlie complement factor D deficiency, which is associated with recurrent bacterial meningitis infections in human patients. Alternative splicing of this gene results in multiple transcript variants. At least one of these variants encodes a preproprotein that is proteolytically processed to generate the mature protease. 1675 ENSG00000197766 CFD complement factor D NA
This gene encodes ubiquitin, one of the most conserved proteins known. Ubiquitin has a major role in targeting cellular proteins for degradation by the 26S proteosome. It is also involved in the maintenance of chromatin structure, the regulation of gene expression, and the stress response. Ubiquitin is synthesized as a precursor protein consisting of either polyubiquitin chains or a single ubiquitin moiety fused to an unrelated protein. This gene consists of three direct repeats of the ubiquitin coding sequence with no spacer sequence. Consequently, the protein is expressed as a polyubiquitin precursor with a final amino acid after the last repeat. An aberrant form of this protein has been detected in patients with Alzheimer’s disease and Down syndrome. Pseudogenes of this gene are located on chromosomes 1, 2, 13, and 17. Alternative splicing results in multiple transcript variants. 7314 ENSG00000170315 UBB ubiquitin B NA
This gene encodes a regulatory subunit of protein phosphatase-1 (PP1). PP1 catalyzes reversible protein phosphorylation, which is important in a wide range of cellular activities: neuronal, muscular, RNA splicing, protein synthesis, cell death, and glycogen metabolism, to name just a few. By interacting with different regulatory subunits, PP1 is directed to different parts of the cell, to different substrates, or to respond to extracellular signals. 5507 ENSG00000119938 PPP1R3C protein phosphatase 1 regulatory subunit 3C NA
The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene. 23336 ENSG00000182253 SYNM synemin NA
The protein encoded by this gene is a transmembrane (type I) heparan sulfate proteoglycan and is a member of the syndecan proteoglycan family. The syndecans mediate cell binding, cell signaling, and cytoskeletal organization and syndecan receptors are required for internalization of the HIV-1 tat protein. The syndecan-2 protein functions as an integral membrane protein and participates in cell proliferation, cell migration and cell-matrix interactions via its receptor for extracellular matrix proteins. Altered syndecan-2 expression has been detected in several different tumor types. 6383 ENSG00000169439 SDC2 syndecan 2 NA
The four human glycoprotein hormones chorionic gonadotropin (CG), luteinizing hormone (LH), follicle stimulating hormone (FSH), and thyroid stimulating hormone (TSH) are dimers consisting of alpha and beta subunits that are associated noncovalently. The alpha subunits of these hormones are identical, however, their beta chains are unique and confer biological specificity. The protein encoded by this gene is the alpha subunit and belongs to the glycoprotein hormones alpha chain family. Two transcript variants encoding different isoforms have been found for this gene. 1081 ENSG00000135346 CGA glycoprotein hormones, alpha polypeptide NA
This gene encodes a member of the NOTCH family of proteins. Members of this Type I transmembrane protein family share structural characteristics including an extracellular domain consisting of multiple epidermal growth factor-like (EGF) repeats, and an intracellular domain consisting of multiple different domain types. Notch signaling is an evolutionarily conserved intercellular signaling pathway that regulates interactions between physically adjacent cells through binding of Notch family receptors to their cognate ligands. The encoded preproprotein is proteolytically processed in the trans-Golgi network to generate two polypeptide chains that heterodimerize to form the mature cell-surface receptor. This receptor plays a role in the development of numerous cell and tissue types. Mutations in this gene are associated with aortic valve disease, Adams-Oliver syndrome, T-cell acute lymphoblastic leukemia, chronic lymphocytic leukemia, and head and neck squamous cell carcinoma. 4851 ENSG00000148400 NOTCH1 notch 1 NA
NA 222166 ENSG00000180354 MTURN maturin, neural progenitor differentiation regulator homolog (Xenopus) NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",18,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 19 Annotations

out <- mygene::queryMany(gene_list[19,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id name query summary notfound
NEB 4703 nebulin ENSG00000183091 This gene encodes nebulin, a giant protein component of the cytoskeletal matrix that coexists with the thick and thin filaments within the sarcomeres of skeletal muscle. In most vertebrates, nebulin accounts for 3 to 4% of the total myofibrillar protein. The encoded protein contains approximately 30-amino acid long modules that can be classified into 7 types and other repeated modules. Protein isoform sizes vary from 600 to 800 kD due to alternative splicing that is tissue-, species-,and developmental stage-specific. Of the 183 exons in the nebulin gene, at least 43 are alternatively spliced, although exons 143 and 144 are not found in the same transcript. Of the several thousand transcript variants predicted for nebulin, the RefSeq Project has decided to create three representative RefSeq records. Mutations in this gene are associated with recessive nemaline myopathy. NA
MYBPC1 4604 myosin binding protein C, slow type ENSG00000196091 This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
TPM1 7168 tropomyosin 1 (alpha) ENSG00000140416 This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. NA
ACTC1 70 actin, alpha, cardiac muscle 1 ENSG00000159251 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). NA
TNNT2 7139 troponin T2, cardiac type ENSG00000118194 The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. NA
MYH1 4619 myosin, heavy chain 1, skeletal muscle, adult ENSG00000109061 Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development. NA
MYH6 4624 myosin, heavy chain 6, cardiac muscle, alpha ENSG00000197616 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. NA
NPPA 4878 natriuretic peptide A ENSG00000175206 The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. NA
MYBPC3 4607 myosin binding protein C, cardiac ENSG00000134571 MYBPC3 encodes the cardiac isoform of myosin-binding protein C. Myosin-binding protein C is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. MYBPC3, the cardiac isoform, is expressed exclussively in heart muscle. Regulatory phosphorylation of the cardiac isoform in vivo by cAMP-dependent protein kinase (PKA) upon adrenergic stimulation may be linked to modulation of cardiac contraction. Mutations in MYBPC3 are one cause of familial hypertrophic cardiomyopathy. NA
TPT1 7178 tumor protein, translationally-controlled 1 ENSG00000133112 NA NA
PDK4 5166 pyruvate dehydrogenase kinase 4 ENSG00000004799 This gene is a member of the PDK/BCKDK protein kinase family and encodes a mitochondrial protein with a histidine kinase domain. This protein is located in the matrix of the mitrochondria and inhibits the pyruvate dehydrogenase complex by phosphorylating one of its subunits, thereby contributing to the regulation of glucose metabolism. Expression of this gene is regulated by glucocorticoids, retinoic acid and insulin. NA
RYR1 6261 ryanodine receptor 1 ENSG00000196218 This gene encodes a ryanodine receptor found in skeletal muscle. The encoded protein functions as a calcium release channel in the sarcoplasmic reticulum but also serves to connect the sarcoplasmic reticulum and transverse tubule. Mutations in this gene are associated with malignant hyperthermia susceptibility, central core disease, and minicore myopathy with external ophthalmoplegia. Alternatively spliced transcripts encoding different isoforms have been described. NA
TNNC2 7125 troponin C2, fast skeletal type ENSG00000101470 Troponin (Tn), a key protein complex in the regulation of striated muscle contraction, is composed of 3 subunits. The Tn-I subunit inhibits actomyosin ATPase, the Tn-T subunit binds tropomyosin and Tn-C, while the Tn-C subunit binds calcium and overcomes the inhibitory action of the troponin complex on actin filaments. The protein encoded by this gene is the Tn-C subunit. NA
MYL7 58498 myosin light chain 7 ENSG00000106631 NA NA
ATP2A1 487 ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 1 ENSG00000196296 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol to the sarcoplasmic reticulum lumen, and is involved in muscular excitation and contraction. Mutations in this gene cause some autosomal recessive forms of Brody disease, characterized by increasing impairment of muscular relaxation during exercise. Alternative splicing results in three transcript variants encoding different isoforms. NA
MYH2 4620 myosin, heavy chain 2, skeletal muscle, adult ENSG00000125414 Myosins are actin-based motor proteins that function in the generation of mechanical force in eukaryotic cells. Muscle myosins are heterohexamers composed of 2 myosin heavy chains and 2 pairs of nonidentical myosin light chains. This gene encodes a member of the class II or conventional myosin heavy chains, and functions in skeletal muscle contraction. This gene is found in a cluster of myosin heavy chain genes on chromosome 17. A mutation in this gene results in inclusion body myopathy-3. Multiple alternatively spliced variants, encoding the same protein, have been identified. NA
DKK3 27122 dickkopf WNT signaling pathway inhibitor 3 ENSG00000050165 This gene encodes a protein that is a member of the dickkopf family. The secreted protein contains two cysteine rich regions and is involved in embryonic development through its interactions with the Wnt signaling pathway. The expression of this gene is decreased in a variety of cancer cell lines and it may function as a tumor suppressor gene. Alternative splicing results in multiple transcript variants encoding the same protein. NA
CPA1 1357 carboxypeptidase A1 ENSG00000091704 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. NA
MYH11 4629 myosin, heavy chain 11, smooth muscle ENSG00000133392 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. NA
NPPA-AS1 ENSG00000242349 NPPA antisense RNA 1 ENSG00000242349 NA NA
MYBPC2 4606 myosin binding protein C, fast type ENSG00000086967 This gene encodes a member of the myosin-binding protein C family. This family includes the fast-, slow- and cardiac-type isoforms, each of which is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The protein encoded by this locus is referred to as the fast-type isoform. Mutations in the related but distinct genes encoding the slow-type and cardiac-type isoforms have been associated with distal arthrogryposis, type 1 and hypertrophic cardiomyopathy, respectively. NA
CPB1 1360 carboxypeptidase B1 ENSG00000153002 Three different procarboxypeptidases A and two different procarboxypeptidases B have been isolated. The B1 and B2 forms differ from each other mainly in isoelectric point. Carboxypeptidase B1 is a highly tissue-specific protein and is a useful serum marker for acute pancreatitis and dysfunction of pancreatic transplants. It is not elevated in pancreatic carcinoma. NA
ANKRD1 27063 ankyrin repeat domain 1 ENSG00000148677 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. NA
PRSS1 5644 protease, serine 1 ENSG00000204983 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. NA
BIN1 274 bridging integrator 1 ENSG00000136717 This gene encodes several isoforms of a nucleocytoplasmic adaptor protein, one of which was initially identified as a MYC-interacting protein with features of a tumor suppressor. Isoforms that are expressed in the central nervous system may be involved in synaptic vesicle endocytosis and may interact with dynamin, synaptojanin, endophilin, and clathrin. Isoforms that are expressed in muscle and ubiquitously expressed isoforms localize to the cytoplasm and nucleus and activate a caspase-independent apoptotic process. Studies in mouse suggest that this gene plays an important role in cardiac muscle development. Alternate splicing of the gene results in several transcript variants encoding different isoforms. Aberrant splice variants expressed in tumor cell lines have also been described. NA
PYGB 5834 phosphorylase, glycogen; brain ENSG00000100994 The protein encoded by this gene is a glycogen phosphorylase found predominantly in the brain. The encoded protein forms homodimers which can associate into homotetramers, the enzymatically active form of glycogen phosphorylase. The activity of this enzyme is positively regulated by AMP and negatively regulated by ATP, ADP, and glucose-6-phosphate. This enzyme catalyzes the rate-determining step in glycogen degradation. NA
MYL1 4632 myosin light chain 1 ENSG00000168530 Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain expressed in fast skeletal muscle. Two transcript variants have been identified for this gene. NA
FN1 2335 fibronectin 1 ENSG00000115414 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. NA
TNNI3 7137 troponin I3, cardiac type ENSG00000129991 Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. This gene encodes the TnI-cardiac protein and is exclusively expressed in cardiac muscle tissues. Mutations in this gene cause familial hypertrophic cardiomyopathy type 7 (CMH7) and familial restrictive cardiomyopathy (RCM). NA
RP11-290D2.6 ENSG00000273149 NA ENSG00000273149 NA NA
CRIP2 1397 cysteine rich protein 2 ENSG00000182809 This gene encodes a putative transcription factor with two LIM zinc-binding domains. The encoded protein may participate in the differentiation of smooth muscle tissue. Alternative splicing results in multiple transcript variants. NA
NEAT1 283131 nuclear paraspeckle assembly transcript 1 (non-protein coding) ENSG00000245532 This gene produces a long non-coding RNA (lncRNA) transcribed from the multiple endocrine neoplasia locus. This lncRNA is retained in the nucleus where it forms the core structural component of the paraspeckle sub-organelles. It may act as a transcriptional regulator for numerous genes, including some genes involved in cancer progression. NA
PNLIP 5406 pancreatic lipase ENSG00000175535 This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. NA
GP2 2813 glycoprotein 2 ENSG00000169347 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. NA
YBX3 8531 Y-box binding protein 3 ENSG00000060138 NA NA
KLHL41 10324 kelch like family member 41 ENSG00000239474 This gene is a member of the kelch-like family. The encoded protein contains a BACK domain, a BTB/POZ domain, and 5 Kelch repeats. This protein is thought to function in skeletal muscle development and maintenance. Mutations in this gene have been associated with nemaline myopathy (NM), a rare congenital muscle disorder. NA
ZFAND5 7763 zinc finger AN1-type containing 5 ENSG00000107372 NA NA
HBB 3043 hemoglobin subunit beta ENSG00000244734 The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. NA
KRT10 3858 keratin 10 ENSG00000186395 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. NA
MYLPF 29895 myosin light chain, phosphorylatable, fast skeletal muscle ENSG00000180209 NA NA
TNNI2 7136 troponin I2, fast skeletal type ENSG00000130598 This gene encodes a fast-twitch skeletal muscle protein, a member of the troponin I gene family, and a component of the troponin complex including troponin T, troponin C and troponin I subunits. The troponin complex, along with tropomyosin, is responsible for the calcium-dependent regulation of striated muscle contraction. Mouse studies show that this component is also present in vascular smooth muscle and may play a role in regulation of smooth muscle function. In addition to muscle tissues, this protein is found in corneal epithelium, cartilage where it is an inhibitor of angiogenesis to inhibit tumor growth and metastasis, and mammary gland where it functions as a co-activator of estrogen receptor-related receptor alpha. This protein also suppresses tumor growth in human ovarian carcinoma. Mutations in this gene cause myopathy and distal arthrogryposis type 2B. Alternatively spliced transcript variants have been found for this gene. NA
CELA3A 10136 chymotrypsin like elastase family member 3A ENSG00000142789 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. NA
PFKFB3 5209 6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 3 ENSG00000170525 The protein encoded by this gene belongs to a family of bifunctional proteins that are involved in both the synthesis and degradation of fructose-2,6-bisphosphate, a regulatory molecule that controls glycolysis in eukaryotes. The encoded protein has a 6-phosphofructo-2-kinase activity that catalyzes the synthesis of fructose-2,6-bisphosphate (F2,6BP), and a fructose-2,6-biphosphatase activity that catalyzes the degradation of F2,6BP. This protein is required for cell cycle progression and prevention of apoptosis. It functions as a regulator of cyclin-dependent kinase 1, linking glucose metabolism to cell proliferation and survival in tumor cells. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
TNNT3 7140 troponin T3, fast skeletal type ENSG00000130595 The binding of Ca(2+) to the trimeric troponin complex initiates the process of muscle contraction. Increased Ca(2+) concentrations produce a conformational change in the troponin complex that is transmitted to tropomyosin dimers situated along actin filaments. The altered conformation permits increased interaction between a myosin head and an actin filament which, ultimately, produces a muscle contraction. The troponin complex has protein subunits C, I, and T. Subunit C binds Ca(2+) and subunit I binds to actin and inhibits actin-myosin interaction. Subunit T binds the troponin complex to the tropomyosin complex and is also required for Ca(2+)-mediated activation of actomyosin ATPase activity. There are 3 different troponin T genes that encode tissue-specific isoforms of subunit T for fast skeletal-, slow skeletal-, and cardiac-muscle. This gene encodes fast skeletal troponin T protein; also known as troponin T type 3. Alternative splicing results in multiple transcript variants encoding additional distinct troponin T type 3 isoforms. A developmentally regulated switch between fetal/neonatal and adult troponin T type 3 isoforms occurs. Additional splice variants have been described but their biological validity has not been established. Mutations in this gene may cause distal arthrogryposis multiplex congenita type 2B (DA2B). NA
HSPB7 27129 heat shock protein family B (small) member 7 ENSG00000173641 NA NA
STAC3 246329 SH3 and cysteine rich domain 3 ENSG00000185482 The protein encoded by this gene is a component of the excitation-contraction coupling machinery of muscles. This protein is a member of the Stac gene family and contains an N-terminal cysteine-rich domain and two SH3 domains. Mutations in this gene are a cause of Native American myopathy. NA
MYL4 4635 myosin light chain 4 ENSG00000198336 Myosin is a hexameric ATPase cellular motor protein. It is composed of two myosin heavy chains, two nonphosphorylatable myosin alkali light chains, and two phosphorylatable myosin regulatory light chains. This gene encodes a myosin alkali light chain that is found in embryonic muscle and adult atria. Two alternatively spliced transcript variants encoding the same protein have been found for this gene. NA
MYL9 10398 myosin light chain 9 ENSG00000101335 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. NA
TNNT1 7138 troponin T1, slow skeletal type ENSG00000105048 This gene encodes a protein that is a subunit of troponin, which is a regulatory complex located on the thin filament of the sarcomere. This complex regulates striated muscle contraction in response to fluctuations in intracellular calcium concentration. This complex is composed of three subunits: troponin C, which binds calcium, troponin T, which binds tropomyosin, and troponin I, which is an inhibitory subunit. This protein is the slow skeletal troponin T subunit. Mutations in this gene cause nemaline myopathy type 5, also known as Amish nemaline myopathy, a neuromuscular disorder characterized by muscle weakness and rod-shaped, or nemaline, inclusions in skeletal muscle fibers which affects infants, resulting in death due to respiratory insufficiency, usually in the second year. Multiple transcript variants encoding different isoforms have been found for this gene. NA
CASQ2 845 calsequestrin 2 ENSG00000118729 The protein encoded by this gene specifies the cardiac muscle family member of the calsequestrin family. Calsequestrin is localized to the sarcoplasmic reticulum in cardiac and slow skeletal muscle cells. The protein is a calcium binding protein that stores calcium for muscle function. Mutations in this gene cause stress-induced polymorphic ventricular tachycardia, also referred to as catecholaminergic polymorphic ventricular tachycardia 2 (CPVT2), a disease characterized by bidirectional ventricular tachycardia that may lead to cardiac arrest. NA
SLC7A2 6542 solute carrier family 7 member 2 ENSG00000003989 The protein encoded by this gene is a cationic amino acid transporter and a member of the APC (amino acid-polyamine-organocation) family of transporters. The encoded membrane protein is responsible for the cellular uptake of arginine, lysine and ornithine. Three transcript variants encoding different isoforms have been found for this gene. NA
NA NA NA ENSG00000259716 NA TRUE
RBPMS2 348093 RNA binding protein with multiple splicing 2 ENSG00000166831 NA NA
CEL 1056 carboxyl ester lipase ENSG00000170835 The protein encoded by this gene is a glycoprotein secreted from the pancreas into the digestive tract and from the lactating mammary gland into human milk. The physiological role of this protein is in cholesterol and lipid-soluble vitamin ester hydrolysis and absorption. This encoded protein promotes large chylomicron production in the intestine. Also its presence in plasma suggests its interactions with cholesterol and oxidized lipoproteins to modulate the progression of atherosclerosis. In pancreatic tumoral cells, this encoded protein is thought to be sequestrated within the Golgi compartment and is probably not secreted. This gene contains a variable number of tandem repeat (VNTR) polymorphism in the coding region that may influence the function of the encoded protein. NA
CLPS 1208 colipase ENSG00000137392 The protein encoded by this gene is a cofactor needed by pancreatic lipase for efficient dietary lipid hydrolysis. It binds to the C-terminal, non-catalytic domain of lipase, thereby stabilizing an active conformation and considerably increasing the overall hydrophobic binding site. The gene product allows lipase to anchor noncovalently to the surface of lipid micelles, counteracting the destabilizing influence of intestinal bile salts. This cofactor is only expressed in pancreatic acinar cells, suggesting regulation of expression by tissue-specific elements. Three transcript variants encoding different isoforms have been found for this gene. NA
CTRB2 440387 chymotrypsinogen B2 ENSG00000168928 NA NA
KRT1 3848 keratin 1 ENSG00000167768 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. NA
MICAL2 9645 microtubule associated monooxygenase, calponin and LIM domain containing 2 ENSG00000133816 NA NA
PYGM 5837 phosphorylase, glycogen, muscle ENSG00000068976 This gene encodes a muscle enzyme involved in glycogenolysis. Highly similar enzymes encoded by different genes are found in liver and brain. Mutations in this gene are associated with McArdle disease (myophosphorylase deficiency), a glycogen storage disease of muscle. Alternative splicing results in multiple transcript variants. NA
NPPB 4879 natriuretic peptide B ENSG00000120937 This gene is a member of the natriuretic peptide family and encodes a secreted protein which functions as a cardiac hormone. The protein undergoes two cleavage events, one within the cell and a second after secretion into the blood. The protein’s biological actions include natriuresis, diuresis, vasorelaxation, inhibition of renin and aldosterone secretion, and a key role in cardiovascular homeostasis. A high concentration of this protein in the bloodstream is indicative of heart failure. The protein also acts as an antimicrobial peptide with antibacterial and antifungal activity. Mutations in this gene have been associated with postmenopausal osteoporosis. NA
KIAA0368 23392 KIAA0368 ENSG00000136813 NA NA
CA3 761 carbonic anhydrase 3 ENSG00000164879 Carbonic anhydrase III (CAIII) is a member of a multigene family (at least six separate genes are known) that encodes carbonic anhydrase isozymes. These carbonic anhydrases are a class of metalloenzymes that catalyze the reversible hydration of carbon dioxide and are differentially expressed in a number of cell types. The expression of the CA3 gene is strictly tissue specific and present at high levels in skeletal muscle and much lower levels in cardiac and smooth muscle. A proportion of carriers of Duchenne muscle dystrophy have a higher CA3 level than normal. The gene spans 10.3 kb and contains seven exons and six introns. NA
CLIC4 25932 chloride intracellular channel 4 ENSG00000169504 Chloride channels are a diverse group of proteins that regulate fundamental cellular processes including stabilization of cell membrane potential, transepithelial transport, maintenance of intracellular pH, and regulation of cell volume. Chloride intracellular channel 4 (CLIC4) protein, encoded by the CLIC4 gene, is a member of the p64 family; the gene is expressed in many tissues and exhibits a intracellular vesicular pattern in Panc-1 cells (pancreatic cancer cells). NA
C3 718 complement component 3 ENSG00000125730 Complement component C3 plays a central role in the activation of complement system. Its activation is required for both classical and alternative complement activation pathways. The encoded preproprotein is proteolytically processed to generate alpha and beta subunits that form the mature protein, which is then further processed to generate numerous peptide products. The C3a peptide, also known as the C3a anaphylatoxin, modulates inflammation and possesses antimicrobial activity. Mutations in this gene are associated with atypical hemolytic uremic syndrome and age-related macular degeneration in human patients. NA
EIF4B 1975 eukaryotic translation initiation factor 4B ENSG00000063046 NA NA
CELA3B 23436 chymotrypsin like elastase family member 3B ENSG00000219073 Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. NA
KRT2 3849 keratin 2 ENSG00000172867 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. NA
MKNK2 2872 MAP kinase interacting serine/threonine kinase 2 ENSG00000099875 This gene encodes a member of the calcium/calmodulin-dependent protein kinases (CAMK) Ser/Thr protein kinase family, which belongs to the protein kinase superfamily. This protein contains conserved DLG (asp-leu-gly) and ENIL (glu-asn-ile-leu) motifs, and an N-terminal polybasic region which binds importin A and the translation factor scaffold protein eukaryotic initiation factor 4G (eIF4G). This protein is one of the downstream kinases activated by mitogen-activated protein (MAP) kinases. It phosphorylates the eukaryotic initiation factor 4E (eIF4E), thus playing important roles in the initiation of mRNA translation, oncogenic transformation and malignant cell proliferation. In addition to eIF4E, this protein also interacts with von Hippel-Lindau tumor suppressor (VHL), ring-box 1 (Rbx1) and Cullin2 (Cul2), which are all components of the CBC(VHL) ubiquitin ligase E3 complex. Multiple alternatively spliced transcript variants have been found, but the full-length nature and biological activity of only two variants are determined. These two variants encode distinct isoforms which differ in activity and regulation, and in subcellular localization. NA
NEBL 10529 nebulette ENSG00000078114 This gene encodes a nebulin like protein that is abundantly expressed in cardiac muscle. The encoded protein binds actin and interacts with thin filaments and Z-line associated proteins in striated muscle. This protein may be involved in cardiac myofibril assembly. A shorter isoform of this protein termed LIM nebulette is expressed in non-muscle cells and may function as a component of focal adhesion complexes. Alternate splicing results in multiple transcript variants. NA
PABPC4 8761 poly(A) binding protein cytoplasmic 4 ENSG00000090621 Poly(A)-binding proteins (PABPs) bind to the poly(A) tail present at the 3-prime ends of most eukaryotic mRNAs. PABPC4 or IPABP (inducible PABP) was isolated as an activation-induced T-cell mRNA encoding a protein. Activation of T cells increased PABPC4 mRNA levels in T cells approximately 5-fold. PABPC4 contains 4 RNA-binding domains and proline-rich C terminus. PABPC4 is localized primarily to the cytoplasm. It is suggested that PABPC4 might be necessary for regulation of stability of labile mRNA species in activated T cells. PABPC4 was also identified as an antigen, APP1 (activated-platelet protein-1), expressed on thrombin-activated rabbit platelets. PABPC4 may also be involved in the regulation of protein translation in platelets and megakaryocytes or may participate in the binding or stabilization of polyadenylates in platelet dense granules. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
CTRB1 1504 chymotrypsinogen B1 ENSG00000168925 The protein encoded by this gene is one of a family of serine proteases that is secreted into the gastrointestinal tract as an inactive precursor, which is activated by proteolytic cleavage with trypsin. NA
POPDC2 64091 popeye domain containing 2 ENSG00000121577 This gene encodes a member of the POP family of proteins which contain three putative transmembrane domains. This membrane associated protein is predominantly expressed in skeletal and cardiac muscle, and may have an important function in these tissues. NA
FOXO1 2308 forkhead box O1 ENSG00000150907 This gene belongs to the forkhead family of transcription factors which are characterized by a distinct forkhead domain. The specific function of this gene has not yet been determined; however, it may play a role in myogenic growth and differentiation. Translocation of this gene with PAX3 has been associated with alveolar rhabdomyosarcoma. NA
ACTB 60 actin, beta ENSG00000075624 This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. NA
MYADM 91663 myeloid-associated differentiation marker ENSG00000179820 NA NA
SLC25A4 291 solute carrier family 25 member 4 ENSG00000151729 This gene is a member of the mitochondrial carrier subfamily of solute carrier protein genes. The product of this gene functions as a gated pore that translocates ADP from the cytoplasm into the mitochondrial matrix and ATP from the mitochondrial matrix into the cytoplasm. The protein forms a homodimer embedded in the inner mitochondria membrane. Mutations in this gene have been shown to result in autosomal dominant progressive external opthalmoplegia and familial hypertrophic cardiomyopathy. NA
VIM 7431 vimentin ENSG00000026025 This gene encodes a member of the intermediate filament family. Intermediate filamentents, along with microtubules and actin microfilaments, make up the cytoskeleton. The protein encoded by this gene is responsible for maintaining cell shape, integrity of the cytoplasm, and stabilizing cytoskeletal interactions. It is also involved in the immune response, and controls the transport of low-density lipoprotein (LDL)-derived cholesterol from a lysosome to the site of esterification. It functions as an organizer of a number of critical proteins involved in attachment, migration, and cell signaling. Mutations in this gene causes a dominant, pulverulent cataract. NA
TNC 3371 tenascin C ENSG00000041982 This gene encodes an extracellular matrix protein with a spatially and temporally restricted tissue distribution. This protein is homohexameric with disulfide-linked subunits, and contains multiple EGF-like and fibronectin type-III domains. It is implicated in guidance of migrating neurons as well as axons during development, synaptic plasticity, and neuronal regeneration. NA
CPA2 1358 carboxypeptidase A2 ENSG00000158516 Three different forms of human pancreatic procarboxypeptidase A have been isolated. The encoded protein represents the A2 form, which is a monomeric protein with different biochemical properties from the A1 and A3 forms. The A2 form of pancreatic procarboxypeptidase acts on aromatic C-terminal residues and is a secreted protein. NA
LAMA5 3911 laminin subunit alpha 5 ENSG00000130702 This gene encodes one of the vertebrate laminin alpha chains. Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins are composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively) and they form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. The protein encoded by this gene is the alpha-5 subunit of of laminin-10 (laminin-511), laminin-11 (laminin-521) and laminin-15 (laminin-523). NA
MXRA7 439921 matrix remodeling associated 7 ENSG00000182534 NA NA
JPH2 57158 junctophilin 2 ENSG00000149596 Junctional complexes between the plasma membrane and endoplasmic/sarcoplasmic reticulum are a common feature of all excitable cell types and mediate cross talk between cell surface and intracellular ion channels. The protein encoded by this gene is a component of junctional complexes and is composed of a C-terminal hydrophobic segment spanning the endoplasmic/sarcoplasmic reticulum membrane and a remaining cytoplasmic domain that shows specific affinity for the plasma membrane. This gene is a member of the junctophilin gene family. Alternative splicing has been observed at this locus and two variants encoding distinct isoforms are described. NA
PLA2G1B 5319 phospholipase A2 group IB ENSG00000170890 This gene encodes a secreted member of the phospholipase A2 (PLA2) class of enzymes, which is produced by the pancreatic acinar cells. The encoded calcium-dependent enzyme catalyzes the hydrolysis of the sn-2 position of membrane glycerophospholipids to release arachidonic acid (AA) and lysophospholipids. AA is subsequently converted by downstream metabolic enzymes to several bioactive lipophilic compounds (eicosanoids), including prostaglandins (PGs) and leukotrienes (LTs). The enzyme may be involved in several physiological processes including cell contraction, cell proliferation and pathological response. NA
DEPTOR 64798 DEP domain containing MTOR-interacting protein ENSG00000155792 NA NA
AMY2A 279 amylase, alpha 2A (pancreatic) ENSG00000243480 This gene encodes a member of the alpha-amylase family of proteins. Amylases are secreted proteins that hydrolyze 1,4-alpha-glucoside bonds in oligosaccharides and polysaccharides, catalyzing the first step in digestion of dietary starch and glycogen. This gene and several family members are present in a gene cluster on chromosome 1. This gene encodes an amylase isoenzyme produced by the pancreas. NA
PPP1R27 116729 protein phosphatase 1 regulatory subunit 27 ENSG00000182676 NA NA
KIAA1217 56243 KIAA1217 ENSG00000120549 NA NA
ABLIM2 84448 actin binding LIM protein family member 2 ENSG00000163995 NA NA
LPIN1 23175 lipin 1 ENSG00000134324 This gene encodes a magnesium-ion-dependent phosphatidic acid phosphohydrolase enzyme that catalyzes the penultimate step in triglyceride synthesis including the dephosphorylation of phosphatidic acid to yield diacylglycerol. Expression of this gene is required for adipocyte differentiation and it also functions as a nuclear transcriptional coactivator with some peroxisome proliferator-activated receptors to modulate expression of other genes involved in lipid metabolism. Mutations in this gene are associated with metabolic syndrome, type 2 diabetes, and autosomal recessive acute recurrent myoglobinuria (ARARM). This gene is also a candidate for several human lipodystrophy syndromes. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Additional splice variants have been described but their full-length structures have not been determined. NA
HECTD1 25831 HECT domain E3 ubiquitin protein ligase 1 ENSG00000092148 NA NA
ADCK3 56997 aarF domain containing kinase 3 ENSG00000163050 This gene encodes a mitochondrial protein similar to yeast ABC1, which functions in an electron-transferring membrane protein complex in the respiratory chain. It is not related to the family of ABC transporter proteins. Expression of this gene is induced by the tumor suppressor p53 and in response to DNA damage, and inhibiting its expression partially suppresses p53-induced apoptosis. Alternatively spliced transcript variants have been found; however, their full-length nature has not been determined. NA
FABP4 2167 fatty acid binding protein 4 ENSG00000170323 FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. NA
LOC100507537 100507537 uncharacterized LOC100507537 ENSG00000240045 NA NA
FAM46B 115572 family with sequence similarity 46 member B ENSG00000158246 NA NA
PPP1R12C 54776 protein phosphatase 1 regulatory subunit 12C ENSG00000125503 The gene encodes a subunit of myosin phosphatase. The encoded protein regulates the catalytic activity of protein phosphatase 1 delta and assembly of the actin cytoskeleton. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
PRKAG2 51422 protein kinase AMP-activated non-catalytic subunit gamma 2 ENSG00000106617 AMP-activated protein kinase (AMPK) is a heterotrimeric protein composed of a catalytic alpha subunit, a noncatalytic beta subunit, and a noncatalytic regulatory gamma subunit. Various forms of each of these subunits exist, encoded by different genes. AMPK is an important energy-sensing enzyme that monitors cellular energy status and functions by inactivating key enzymes involved in regulating de novo biosynthesis of fatty acid and cholesterol. This gene is a member of the AMPK gamma subunit family. Mutations in this gene have been associated with Wolff-Parkinson-White syndrome, familial hypertrophic cardiomyopathy, and glycogen storage disease of the heart. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. NA
COL6A2 1292 collagen type VI alpha 2 ENSG00000142173 This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. NA
UCP3 7352 uncoupling protein 3 ENSG00000175564 Mitochondrial uncoupling proteins (UCP) are members of the larger family of mitochondrial anion carrier proteins (MACP). UCPs separate oxidative phosphorylation from ATP synthesis with energy dissipated as heat, also referred to as the mitochondrial proton leak. UCPs facilitate the transfer of anions from the inner to the outer mitochondrial membrane and the return transfer of protons from the outer to the inner mitochondrial membrane. They also reduce the mitochondrial membrane potential in mammalian cells. The different UCPs have tissue-specific expression; this gene is primarily expressed in skeletal muscle. This gene’s protein product is postulated to protect mitochondria against lipid-induced oxidative stress. Expression levels of this gene increase when fatty acid supplies to mitochondria exceed their oxidation capacity and the protein enables the export of fatty acids from mitochondria. UCPs contain the three solcar protein domains typically found in MACPs. Two splice variants have been found for this gene. NA
DHRS7 51635 dehydrogenase/reductase 7 ENSG00000100612 This gene encodes a member of the short-chain dehydrogenases/reductases (SDR) family, which has over 46,000 members. Members in this family are enzymes that metabolize many different compounds, such as steroid hormones, prostaglandins, retinoids, lipids and xenobiotics. NA
VASP 7408 vasodilator-stimulated phosphoprotein ENSG00000125753 Vasodilator-stimulated phosphoprotein (VASP) is a member of the Ena-VASP protein family. Ena-VASP family members contain an EHV1 N-terminal domain that binds proteins containing E/DFPPPPXD/E motifs and targets Ena-VASP proteins to focal adhesions. In the mid-region of the protein, family members have a proline-rich domain that binds SH3 and WW domain-containing proteins. Their C-terminal EVH2 domain mediates tetramerization and binds both G and F actin. VASP is associated with filamentous actin formation and likely plays a widespread role in cell adhesion and motility. VASP may also be involved in the intracellular signaling pathways that regulate integrin-extracellular matrix interactions. VASP is regulated by the cyclic nucleotide-dependent kinases PKA and PKG. NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",19,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 20 Annotations

out <- mygene::queryMany(gene_list[20,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id summary query name notfound
HBB 3043 The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5’-epsilon – gamma-G – gamma-A – delta – beta–3’. ENSG00000244734 hemoglobin subunit beta NA
CYP17A1 1586 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. It has both 17alpha-hydroxylase and 17,20-lyase activities and is a key enzyme in the steroidogenic pathway that produces progestins, mineralocorticoids, glucocorticoids, androgens, and estrogens. Mutations in this gene are associated with isolated steroid-17 alpha-hydroxylase deficiency, 17-alpha-hydroxylase/17,20-lyase deficiency, pseudohermaphroditism, and adrenal hyperplasia. ENSG00000148795 cytochrome P450 family 17 subfamily A member 1 NA
CYP11B1 1584 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the mitochondrial inner membrane and is involved in the conversion of progesterone to cortisol in the adrenal cortex. Mutations in this gene cause congenital adrenal hyperplasia due to 11-beta-hydroxylase deficiency. Transcript variants encoding different isoforms have been noted for this gene. ENSG00000160882 cytochrome P450 family 11 subfamily B member 1 NA
PGC 5225 This gene encodes an aspartic proteinase that belongs to the peptidase family A1. The encoded protein is a digestive enzyme that is produced in the stomach and constitutes a major component of the gastric mucosa. This protein is also secreted into the serum. This protein is synthesized as an inactive zymogen that includes a highly basic prosegment. This enzyme is converted into its active mature form at low pH by sequential cleavage of the prosegment that is carried out by the enzyme itself. Polymorphisms in this gene are associated with susceptibility to gastric cancers. Serum levels of this enzyme are used as a biomarker for certain gastric diseases including Helicobacter pylori related gastritis. Alternate splicing results in multiple transcript variants. A pseudogene of this gene is found on chromosome 1. ENSG00000096088 progastricsin NA
MBP 4155 The protein encoded by the classic MBP gene is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. However, MBP-related transcripts are also present in the bone marrow and the immune system. These mRNAs arise from the long MBP gene (otherwise called ‘Golli-MBP’) that contains 3 additional exons located upstream of the classic MBP exons. Alternative splicing from the Golli and the MBP transcription start sites gives rise to 2 sets of MBP-related transcripts and gene products. The Golli mRNAs contain 3 exons unique to Golli-MBP, spliced in-frame to 1 or more MBP exons. They encode hybrid proteins that have N-terminal Golli aa sequence linked to MBP aa sequence. The second family of transcripts contain only MBP exons and produce the well characterized myelin basic proteins. This complex gene structure is conserved among species suggesting that the MBP transcription unit is an integral part of the Golli transcription unit and that this arrangement is important for the function and/or regulation of these genes. ENSG00000197971 myelin basic protein NA
DES 1674 This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. ENSG00000175084 desmin NA
HSPD1 3329 This gene encodes a member of the chaperonin family. The encoded mitochondrial protein may function as a signaling molecule in the innate immune system. This protein is essential for the folding and assembly of newly imported proteins in the mitochondria. This gene is adjacent to a related family member and the region between the 2 genes functions as a bidirectional promoter. Several pseudogenes have been associated with this gene. Two transcript variants encoding the same protein have been identified for this gene. Mutations associated with this gene cause autosomal recessive spastic paraplegia 13. ENSG00000144381 heat shock protein family D (Hsp60) member 1 NA
HBA2 3040 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. ENSG00000188536 hemoglobin subunit alpha 2 NA
TPM2 7169 This gene encodes beta-tropomyosin, a member of the actin filament binding protein family, and mainly expressed in slow, type 1 muscle fibers. Mutations in this gene can alter the expression of other sarcomeric tropomyosin proteins, and cause cap disease, nemaline myopathy and distal arthrogryposis syndromes. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000198467 tropomyosin 2 (beta) NA
AKR1B1 231 This gene encodes a member of the aldo/keto reductase superfamily, which consists of more than 40 known enzymes and proteins. This member catalyzes the reduction of a number of aldehydes, including the aldehyde form of glucose, and is thereby implicated in the development of diabetic complications by catalyzing the reduction of glucose to sorbitol. Multiple pseudogenes have been identified for this gene. The nomenclature system used by the HUGO Gene Nomenclature Committee to define human aldo-keto reductase family members is known to differ from that used by the Mouse Genome Informatics database. ENSG00000085662 aldo-keto reductase family 1 member B NA
STAR 6770 The protein encoded by this gene plays a key role in the acute regulation of steroid hormone synthesis by enhancing the conversion of cholesterol into pregnenolone. This protein permits the cleavage of cholesterol into pregnenolone by mediating the transport of cholesterol from the outer mitochondrial membrane to the inner mitochondrial membrane. Mutations in this gene are a cause of congenital lipoid adrenal hyperplasia (CLAH), also called lipoid CAH. A pseudogene of this gene is located on chromosome 13. ENSG00000147465 steroidogenic acute regulatory protein NA
LIPF 8513 This gene encodes gastric lipase, an enzyme involved in the digestion of dietary triglycerides in the gastrointestinal tract, and responsible for 30% of fat digestion processes occurring in human. It is secreted by gastric chief cells in the fundic mucosa of the stomach, and it hydrolyzes the ester bonds of triglycerides under acidic pH conditions. The gene is a member of a conserved gene family of lipases that play distinct roles in neutral lipid metabolism. Several transcript variants encoding different isoforms have been found for this gene. ENSG00000182333 lipase F, gastric type NA
TXNRD1 7296 This gene encodes a member of the family of pyridine nucleotide oxidoreductases. This protein reduces thioredoxins as well as other substrates, and plays a role in selenium metabolism and protection against oxidative stress. The functional enzyme is thought to be a homodimer which uses FAD as a cofactor. Each subunit contains a selenocysteine (Sec) residue which is required for catalytic activity. The selenocysteine is encoded by the UGA codon that normally signals translation termination. The 3’ UTR of selenocysteine-containing genes have a common stem-loop structure, the sec insertion sequence (SECIS), that is necessary for the recognition of UGA as a Sec codon rather than as a stop signal. Alternative splicing results in several transcript variants encoding the same or different isoforms. ENSG00000198431 thioredoxin reductase 1 NA
NA NA NA ENSG00000090920 NA TRUE
ATP2B4 493 The protein encoded by this gene belongs to the family of P-type primary ion transport ATPases characterized by the formation of an aspartyl phosphate intermediate during the reaction cycle. These enzymes remove bivalent calcium ions from eukaryotic cells against very large concentration gradients and play a critical role in intracellular calcium homeostasis. The mammalian plasma membrane calcium ATPase isoforms are encoded by at least four separate genes and the diversity of these enzymes is further increased by alternative splicing of transcripts. The expression of different isoforms and splice variants is regulated in a developmental, tissue- and cell type-specific manner, suggesting that these pumps are functionally adapted to the physiological needs of particular cells and tissues. This gene encodes the plasma membrane calcium ATPase isoform 4. Alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000058668 ATPase plasma membrane Ca2+ transporting 4 NA
CYP21A2 1589 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and hydroxylates steroids at the 21 position. Its activity is required for the synthesis of steroid hormones including cortisol and aldosterone. Mutations in this gene cause congenital adrenal hyperplasia. A related pseudogene is located near this gene; gene conversion events involving the functional gene and the pseudogene are thought to account for many cases of steroid 21-hydroxylase deficiency. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000231852 cytochrome P450 family 21 subfamily A member 2 NA
FOSL2 2355 The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. ENSG00000075426 FOS like 2, AP-1 transcription factor subunit NA
HSPA8 3312 This gene encodes a member of the heat shock protein 70 family, which contains both heat-inducible and constitutively expressed members. This protein belongs to the latter group, which are also referred to as heat-shock cognate proteins. It functions as a chaperone, and binds to nascent polypeptides to facilitate correct folding. It also functions as an ATPase in the disassembly of clathrin-coated vesicles during transport of membrane components through the cell. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000109971 heat shock protein family A (Hsp70) member 8 NA
SLC44A2 57153 NA ENSG00000129353 solute carrier family 44 member 2 NA
MYL9 10398 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000101335 myosin light chain 9 NA
KRT10 3858 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. ENSG00000186395 keratin 10 NA
HSP90AA1 3320 The protein encoded by this gene is an inducible molecular chaperone that functions as a homodimer. The encoded protein aids in the proper folding of specific target proteins by use of an ATPase activity that is modulated by co-chaperones. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000080824 heat shock protein 90kDa alpha family class A member 1 NA
STAT3 6774 The protein encoded by this gene is a member of the STAT protein family. In response to cytokines and growth factors, STAT family members are phosphorylated by the receptor associated kinases, and then form homo- or heterodimers that translocate to the cell nucleus where they act as transcription activators. This protein is activated through phosphorylation in response to various cytokines and growth factors including IFNs, EGF, IL5, IL6, HGF, LIF and BMP2. This protein mediates the expression of a variety of genes in response to cell stimuli, and thus plays a key role in many cellular processes such as cell growth and apoptosis. The small GTPase Rac1 has been shown to bind and regulate the activity of this protein. PIAS3 protein is a specific inhibitor of this protein. Mutations in this gene are associated with infantile-onset multisystem autoimmune disease and hyper-immunoglobulin E syndrome. Alternative splicing results in multiple transcript variants encoding distinct isoforms. ENSG00000168610 signal transducer and activator of transcription 3 NA
CYP11A1 1583 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the mitochondrial inner membrane and catalyzes the conversion of cholesterol to pregnenolone, the first and rate-limiting step in the synthesis of the steroid hormones. Two transcript variants encoding different isoforms have been found for this gene. The cellular location of the smaller isoform is unclear since it lacks the mitochondrial-targeting transit peptide. ENSG00000140459 cytochrome P450 family 11 subfamily A member 1 NA
RP11-862L9.3 ENSG00000266844 NA ENSG00000266844 NA NA
ALAS1 211 This gene encodes the mitochondrial enzyme which is catalyzes the rate-limiting step in heme (iron-protoporphyrin) biosynthesis. The enzyme encoded by this gene is the housekeeping enzyme; a separate gene encodes a form of the enzyme that is specific for erythroid tissue. The level of the mature encoded protein is regulated by heme: high levels of heme down-regulate the mature enzyme in mitochondria while low heme levels up-regulate. A pseudogene of this gene is located on chromosome 12. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000023330 5’-aminolevulinate synthase 1 NA
C7 730 C7 is a component of the complement system. It participates in the formation of Membrane Attack Complex (MAC). People with C7 deficiency are prone to bacterial infection. ENSG00000112936 complement component 7 NA
NA NA NA ENSG00000259716 NA TRUE
KRT19 3880 The protein encoded by this gene is a member of the keratin family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. The type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. Unlike its related family members, this smallest known acidic cytokeratin is not paired with a basic cytokeratin in epithelial cells. It is specifically expressed in the periderm, the transiently superficial layer that envelopes the developing epidermis. The type I cytokeratins are clustered in a region of chromosome 17q12-q21. ENSG00000171345 keratin 19 NA
STIP1 10963 STIP1 is an adaptor protein that coordinates the functions of HSP70 (see HSPA1A; MIM 140550) and HSP90 (see HSP90AA1; MIM 140571) in protein folding. It is thought to assist in the transfer of proteins from HSP70 to HSP90 by binding both HSP90 and substrate-bound HSP70. STIP1 also stimulates the ATPase activity of HSP70 and inhibits the ATPase activity of HSP90, suggesting that it regulates both the conformations and ATPase cycles of these chaperones (Song and Masison, 2005 [PubMed 16100115]). ENSG00000168439 stress induced phosphoprotein 1 NA
THBS1 7057 The protein encoded by this gene is a subunit of a disulfide-linked homotrimeric protein. This protein is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions. This protein can bind to fibrinogen, fibronectin, laminin, type V collagen and integrins alpha-V/beta-1. This protein has been shown to play roles in platelet aggregation, angiogenesis, and tumorigenesis. ENSG00000137801 thrombospondin 1 NA
GAPDH 2597 This gene encodes a member of the glyceraldehyde-3-phosphate dehydrogenase protein family. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. The product of this gene catalyzes an important energy-yielding step in carbohydrate metabolism, the reversible oxidative phosphorylation of glyceraldehyde-3-phosphate in the presence of inorganic phosphate and nicotinamide adenine dinucleotide (NAD). The encoded protein has additionally been identified to have uracil DNA glycosylase activity in the nucleus. Also, this protein contains a peptide that has antimicrobial activity against E. coli, P. aeruginosa, and C. albicans. Studies of a similar protein in mouse have assigned a variety of additional functions including nitrosylation of nuclear proteins, the regulation of mRNA stability, and acting as a transferrin receptor on the cell surface of macrophage. Many pseudogenes similar to this locus are present in the human genome. Alternative splicing results in multiple transcript variants. ENSG00000111640 glyceraldehyde-3-phosphate dehydrogenase NA
PGA3 643834 This gene encodes a protein precursor of the digestive enzyme pepsin, a member of the peptidase A1 family of endopeptidases. The encoded precursor is secreted by gastric chief cells and undergoes autocatalytic cleavage in acidic conditions to form the active enzyme, which functions in the digestion of dietary proteins. This gene is found in a cluster of related genes on chromosome 11, each of which encodes one of multiple pepsinogens. Pepsinogen levels in serum may serve as a biomarker for atrophic gastritis and gastric cancer. ENSG00000229859 pepsinogen 3, group I (pepsinogen A) NA
MYH11 4629 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000133392 myosin, heavy chain 11, smooth muscle NA
HSPA9 3313 This gene encodes a member of the heat shock protein 70 gene family. The encoded protein is primarily localized to the mitochondria but is also found in the endoplasmic reticulum, plasma membrane and cytoplasmic vesicles. This protein is a heat-shock cognate protein. This protein plays a role in cell proliferation, stress response and maintenance of the mitochondria. A pseudogene of this gene is found on chromosome 2. ENSG00000113013 heat shock protein family A (Hsp70) member 9 NA
AIF1L 83543 NA ENSG00000126878 allograft inflammatory factor 1 like NA
GFAP 2670 This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. ENSG00000131095 glial fibrillary acidic protein NA
EMILIN1 11117 This gene encodes an extracellular matrix glycoprotein that is characterized by an N-terminal microfibril interface domain, a coiled-coiled alpha-helical domain, a collagenous domain and a C-terminal globular C1q domain. The encoded protein associates with elastic fibers at the interface between elastin and microfibrils and may play a role in the development of elastic tissues including large blood vessels, dermis, heart and lung. ENSG00000138080 elastin microfibril interfacer 1 NA
PPP1R18 170954 Protein phosphatase-1 (PP1; see MIM 176875) interacts with regulatory subunits that target the enzyme to different cellular locations and change its activity toward specific substrates. Phostensin is a regulatory subunit that targets PP1 to F-actin (see MIM 102610) cytoskeleton (Kao et al., 2007 [PubMed 17374523]). ENSG00000146112 protein phosphatase 1 regulatory subunit 18 NA
HSP90AB1 3326 This gene encodes a member of the heat shock protein 90 family; these proteins are involved in signal transduction, protein folding and degradation and morphological evolution. This gene encodes the constitutive form of the cytosolic 90 kDa heat-shock protein and is thought to play a role in gastric apoptosis and inflammation. Alternative splicing results in multiple transcript variants. Pseudogenes have been identified on multiple chromosomes. ENSG00000096384 heat shock protein 90kDa alpha family class B member 1 NA
ACTB 60 This gene encodes one of six different actin proteins. Actins are highly conserved proteins that are involved in cell motility, structure, and integrity. This actin is a major constituent of the contractile apparatus and one of the two nonmuscle cytoskeletal actins. ENSG00000075624 actin, beta NA
EIF4G1 1981 The protein encoded by this gene is a component of the multi-subunit protein complex EIF4F. This complex facilitates the recruitment of mRNA to the ribosome, which is a rate-limiting step during the initiation phase of protein synthesis. The recognition of the mRNA cap and the ATP-dependent unwinding of 5’-terminal secondary structure is catalyzed by factors in this complex. The subunit encoded by this gene is a large scaffolding protein that contains binding sites for other members of the EIF4F complex. A domain at its N-terminus can also interact with the poly(A)-binding protein, which may mediate the circularization of mRNA during translation. Alternative splicing results in multiple transcript variants, some of which are derived from alternative promoter usage. ENSG00000114867 eukaryotic translation initiation factor 4 gamma 1 NA
C10orf10 11067 The expression of this gene is induced by fasting as well as by progesterone. The protein encoded by this gene contains a t-synaptosome-associated protein receptor (SNARE) coiled-coil homology domain and a peroxisomal targeting signal. Production of the encoded protein leads to phosphorylation and activation of the transcription factor ELK1. ENSG00000165507 chromosome 10 open reading frame 10 NA
ST14 6768 The protein encoded by this gene is an epithelial-derived, integral membrane serine protease. This protease forms a complex with the Kunitz-type serine protease inhibitor, HAI-1, and is found to be activated by sphingosine 1-phosphate. This protease has been shown to cleave and activate hepatocyte growth factor/scattering factor, and urokinase plasminogen activator, which suggest the function of this protease as an epithelial membrane activator for other proteases and latent growth factors. The expression of this protease has been associated with breast, colon, prostate, and ovarian tumors, which implicates its role in cancer invasion, and metastasis. ENSG00000149418 suppression of tumorigenicity 14 NA
TPP1 1200 This gene encodes a member of the sedolisin family of serine proteases. The protease functions in the lysosome to cleave N-terminal tripeptides from substrates, and has weaker endopeptidase activity. It is synthesized as a catalytically-inactive enzyme which is activated and auto-proteolyzed upon acidification. Mutations in this gene result in late-infantile neuronal ceroid lipofuscinosis, which is associated with the failure to degrade specific neuropeptides and a subunit of ATP synthase in the lysosome. ENSG00000166340 tripeptidyl peptidase 1 NA
HBA1 3039 The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. ENSG00000206172 hemoglobin subunit alpha 1 NA
RAB11FIP4 84440 Proteins of the large Rab GTPase family (see RAB1A; MIM 179508) have regulatory roles in the formation, targeting, and fusion of intracellular transport vesicles. RAB11FIP4 is one of many proteins that interact with and regulate Rab GTPases (Hales et al., 2001 [PubMed 11495908]). ENSG00000131242 RAB11 family interacting protein 4 NA
KRT2 3849 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000172867 keratin 2 NA
SLC2A3 6515 NA ENSG00000059804 solute carrier family 2 member 3 NA
PTGFRN 5738 NA ENSG00000134247 prostaglandin F2 receptor inhibitor NA
DLG5 9231 This gene encodes a member of the family of discs large (DLG) homologs, a subset of the membrane-associated guanylate kinase (MAGUK) superfamily. The MAGUK proteins are composed of a catalytically inactive guanylate kinase domain, in addition to PDZ and SH3 domains, and are thought to function as scaffolding molecules at sites of cell-cell contact. The protein encoded by this gene localizes to the plasma membrane and cytoplasm, and interacts with components of adherens junctions and the cytoskeleton. It is proposed to function in the transmission of extracellular signals to the cytoskeleton and in the maintenance of epithelial cell structure. Alternative splice variants have been described but their biological nature has not been determined. ENSG00000151208 discs large MAGUK scaffold protein 5 NA
DNAJA1 3301 This gene encodes a member of the DnaJ family of proteins, which act as heat shock protein 70 cochaperones. Heat shock proteins facilitate protein folding, trafficking, prevention of aggregation, and proteolytic degradation. Members of this family are characterized by a highly conserved N-terminal J domain, a glycine/phenylalanine-rich region, four CxxCxGxG zinc finger repeats, and a C-terminal substrate-binding domain. The J domain mediates the interaction with heat shock protein 70 to recruit substrates and regulate ATP hydrolysis activity. In humans, this gene has been implicated in positive regulation of virus replication through co-option by the influenza A virus. Several pseudogenes of this gene are found on other chromosomes. ENSG00000086061 DnaJ heat shock protein family (Hsp40) member A1 NA
CXCL14 9547 This antimicrobial gene belongs to the cytokine gene family which encode secreted proteins involved in immunoregulatory and inflammatory processes. The protein encoded by this gene is structurally related to the CXC (Cys-X-Cys) subfamily of cytokines. Members of this subfamily are characterized by two cysteines separated by a single amino acid. This cytokine displays chemotactic activity for monocytes but not for lymphocytes, dendritic cells, neutrophils or macrophages. It has been implicated that this cytokine is involved in the homeostasis of monocyte-derived macrophages rather than in inflammation. ENSG00000145824 C-X-C motif chemokine ligand 14 NA
C4B 721 This gene encodes the basic form of complement factor 4, part of the classical activation pathway. The protein is expressed as a single chain precursor which is proteolytically cleaved into a trimer of alpha, beta, and gamma chains prior to secretion. The trimer provides a surface for interaction between the antigen-antibody complex and other complement components. The alpha chain may be cleaved to release C4 anaphylatoxin, a mediator of local inflammation. Deficiency of this protein is associated with systemic lupus erythematosus. This gene localizes to the major histocompatibility complex (MHC) class III region on chromosome 6. Varying haplotypes of this gene cluster exist, such that individuals may have 1, 2, or 3 copies of this gene. In addition, this gene exists as a long form and a short form due to the presence or absence of a 6.4 kb endogenous HERV-K retrovirus in intron 9. ENSG00000224389 complement component 4B (Chido blood group) NA
HSD11B2 3291 There are at least two isozymes of the corticosteroid 11-beta-dehydrogenase, a microsomal enzyme complex responsible for the interconversion of cortisol and cortisone. The type I isozyme has both 11-beta-dehydrogenase (cortisol to cortisone) and 11-oxoreductase (cortisone to cortisol) activities. The type II isozyme, encoded by this gene, has only 11-beta-dehydrogenase activity. In aldosterone-selective epithelial tissues such as the kidney, the type II isozyme catalyzes the glucocorticoid cortisol to the inactive metabolite cortisone, thus preventing illicit activation of the mineralocorticoid receptor. In tissues that do not express the mineralocorticoid receptor, such as the placenta and testis, it protects cells from the growth-inhibiting and/or pro-apoptotic effects of cortisol, particularly during embryonic development. Mutations in this gene cause the syndrome of apparent mineralocorticoid excess and hypertension. ENSG00000176387 hydroxysteroid 11-beta dehydrogenase 2 NA
COL1A2 1278 This gene encodes the pro-alpha2 chain of type I collagen whose triple helix comprises two alpha1 chains and one alpha2 chain. Type I is a fibril-forming collagen found in most connective tissues and is abundant in bone, cornea, dermis and tendon. Mutations in this gene are associated with osteogenesis imperfecta types I-IV, Ehlers-Danlos syndrome type VIIB, recessive Ehlers-Danlos syndrome Classical type, idiopathic osteoporosis, and atypical Marfan syndrome. Symptoms associated with mutations in this gene, however, tend to be less severe than mutations in the gene for the alpha1 chain of type I collagen (COL1A1) reflecting the different role of alpha2 chains in matrix integrity. Three transcripts, resulting from the use of alternate polyadenylation signals, have been identified for this gene. ENSG00000164692 collagen type I alpha 2 chain NA
FN1 2335 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. ENSG00000115414 fibronectin 1 NA
PPP1R1B 84152 This gene encodes a bifunctional signal transduction molecule. Dopaminergic and glutamatergic receptor stimulation regulates its phosphorylation and function as a kinase or phosphatase inhibitor. As a target for dopamine, this gene may serve as a therapeutic target for neurologic and psychiatric disorders. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000131771 protein phosphatase 1 regulatory inhibitor subunit 1B NA
HIF1A 3091 This gene encodes the alpha subunit of transcription factor hypoxia-inducible factor-1 (HIF-1), which is a heterodimer composed of an alpha and a beta subunit. HIF-1 functions as a master regulator of cellular and systemic homeostatic response to hypoxia by activating transcription of many genes, including those involved in energy metabolism, angiogenesis, apoptosis, and other genes whose protein products increase oxygen delivery or facilitate metabolic adaptation to hypoxia. HIF-1 thus plays an essential role in embryonic vascularization, tumor angiogenesis and pathophysiology of ischemic disease. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. ENSG00000100644 hypoxia inducible factor 1 alpha subunit NA
HIP1R 9026 NA ENSG00000130787 huntingtin interacting protein 1 related NA
CTNNAL1 8727 NA ENSG00000119326 catenin alpha like 1 NA
FBLN1 2192 Fibulin 1 is a secreted glycoprotein that becomes incorporated into a fibrillar extracellular matrix. Calcium-binding is apparently required to mediate its binding to laminin and nidogen. It mediates platelet adhesion via binding fibrinogen. Four splice variants which differ in the 3’ end have been identified. Each variant encodes a different isoform, but no functional distinctions have been identified among the four variants. ENSG00000077942 fibulin 1 NA
DCN 1634 This gene encodes a member of the small leucine-rich proteoglycan family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature protein. This protein plays a role in collagen fibril assembly. Binding of this protein to multiple cell surface receptors mediates its role in tumor suppression, including a stimulatory effect on autophagy and inflammation and an inhibitory effect on angiogenesis and tumorigenesis. This gene and the related gene biglycan are thought to be the result of a gene duplication. Mutations in this gene are associated with congenital stromal corneal dystrophy in human patients. ENSG00000011465 decorin NA
SOX9 6662 The protein encoded by this gene recognizes the sequence CCTTGAG along with other members of the HMG-box class DNA-binding proteins. It acts during chondrocyte differentiation and, with steroidogenic factor 1, regulates transcription of the anti-Muellerian hormone (AMH) gene. Deficiencies lead to the skeletal malformation syndrome campomelic dysplasia, frequently with sex reversal. ENSG00000125398 SRY-box 9 NA
LDLR 3949 The low density lipoprotein receptor (LDLR) gene family consists of cell surface proteins involved in receptor-mediated endocytosis of specific ligands. Low density lipoprotein (LDL) is normally bound at the cell membrane and taken into the cell ending up in lysosomes where the protein is degraded and the cholesterol is made available for repression of microsomal enzyme 3-hydroxy-3-methylglutaryl coenzyme A (HMG CoA) reductase, the rate-limiting step in cholesterol synthesis. At the same time, a reciprocal stimulation of cholesterol ester synthesis takes place. Mutations in this gene cause the autosomal dominant disorder, familial hypercholesterolemia. Alternate splicing results in multiple transcript variants. ENSG00000130164 low density lipoprotein receptor NA
ADIRF 10974 APM2 gene is exclusively expressed in adipose tissue. Its function is currently unknown. ENSG00000148671 adipogenesis regulatory factor NA
ZFP36 7538 NA ENSG00000128016 ZFP36 ring finger protein NA
C1S 716 This gene encodes a serine protease, which is a major constituent of the human complement subcomponent C1. C1s associates with two other complement components C1r and C1q in order to yield the first component of the serum complement system. Defects in this gene are the cause of selective C1s deficiency. ENSG00000182326 complement component 1, s subcomponent NA
KANK2 25959 NA ENSG00000197256 KN motif and ankyrin repeat domains 2 NA
GP2 2813 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. ENSG00000169347 glycoprotein 2 NA
GADD45B 4616 This gene is a member of a group of genes whose transcript levels are increased following stressful growth arrest conditions and treatment with DNA-damaging agents. The genes in this group respond to environmental stresses by mediating activation of the p38/JNK pathway. This activation is mediated via their proteins binding and activating MTK1/MEKK4 kinase, which is an upstream activator of both p38 and JNK MAPKs. The function of these genes or their protein products is involved in the regulation of growth and apoptosis. These genes are regulated by different mechanisms, but they are often coordinately expressed and can function cooperatively in inhibiting cell growth. ENSG00000099860 growth arrest and DNA damage inducible beta NA
STMN1 3925 This gene belongs to the stathmin family of genes. It encodes a ubiquitous cytosolic phosphoprotein proposed to function as an intracellular relay integrating regulatory signals of the cellular environment. The encoded protein is involved in the regulation of the microtubule filament system by destabilizing microtubules. It prevents assembly and promotes disassembly of microtubules. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000117632 stathmin 1 NA
TPM1 7168 This gene is a member of the tropomyosin family of highly conserved, widely distributed actin-binding proteins involved in the contractile system of striated and smooth muscles and the cytoskeleton of non-muscle cells. Tropomyosin is composed of two alpha-helical chains arranged as a coiled-coil. It is polymerized end to end along the two grooves of actin filaments and provides stability to the filaments. The encoded protein is one type of alpha helical chain that forms the predominant tropomyosin of striated muscle, where it also functions in association with the troponin complex to regulate the calcium-dependent interaction of actin and myosin during muscle contraction. In smooth muscle and non-muscle cells, alternatively spliced transcript variants encoding a range of isoforms have been described. Mutations in this gene are associated with type 3 familial hypertrophic cardiomyopathy. ENSG00000140416 tropomyosin 1 (alpha) NA
YARS 8565 Aminoacyl-tRNA synthetases catalyze the aminoacylation of tRNA by their cognate amino acid. Because of their central role in linking amino acids with nucleotide triplets contained in tRNAs, aminoacyl-tRNA synthetases are thought to be among the first proteins that appeared in evolution. Tyrosyl-tRNA synthetase belongs to the class I tRNA synthetase family. Cytokine activities have also been observed for the human tyrosyl-tRNA synthetase, after it is split into two parts, an N-terminal fragment that harbors the catalytic site and a C-terminal fragment found only in the mammalian enzyme. The N-terminal fragment is an interleukin-8-like cytokine, whereas the released C-terminal fragment is an EMAP II-like cytokine. ENSG00000134684 tyrosyl-tRNA synthetase NA
IGF2R 3482 This gene encodes a receptor for both insulin-like growth factor 2 and mannose 6-phosphate. The binding sites for each ligand are located on different segments of the protein. This receptor has various functions, including in the intracellular trafficking of lysosomal enzymes, the activation of transforming growth factor beta, and the degradation of insulin-like growth factor 2. Mutation or loss of heterozygosity of this gene has been association with risk of hepatocellular carcinoma. The orthologous mouse gene is imprinted and shows exclusive expression from the maternal allele; however, imprinting of the human gene may be polymorphic, as only a minority of individuals showed biased expression from the maternal allele (PMID:8267611). ENSG00000197081 insulin like growth factor 2 receptor NA
TINAGL1 64129 The protein encoded by this gene is similar in sequence to tubulointerstitial nephritis antigen, a secreted glycoprotein that is recognized by antibodies in some types of immune-related tubulointerstitial nephritis. Three transcript variants encoding different isoforms have been found for this gene. ENSG00000142910 tubulointerstitial nephritis antigen like 1 NA
FDX1 2230 This gene encodes a small iron-sulfur protein that transfers electrons from NADPH through ferredoxin reductase to mitochondrial cytochrome P450, involved in steroid, vitamin D, and bile acid metabolism. Pseudogenes of this functional gene are found on chromosomes 20 and 21. ENSG00000137714 ferredoxin 1 NA
GKN1 56287 The protein encoded by this gene is found to be down-regulated in human gastric cancer tissue as compared to normal gastric mucosa. ENSG00000169605 gastrokine 1 NA
NFIL3 4783 The protein encoded by this gene is a transcriptional regulator that binds as a homodimer to activating transcription factor (ATF) sites in many cellular and viral promoters. The encoded protein represses PER1 and PER2 expression and therefore plays a role in the regulation of circadian rhythm. Three transcript variants encoding the same protein have been found for this gene. ENSG00000165030 nuclear factor, interleukin 3 regulated NA
DDR2 4921 Receptor tyrosine kinases (RTKs) play a key role in the communication of cells with their microenvironment. These molecules are involved in the regulation of cell growth, differentiation, and metabolism. In several cases the biochemical mechanism by which RTKs transduce signals across the membrane has been shown to be ligand induced receptor oligomerization and subsequent intracellular phosphorylation. This autophosphorylation leads to phosphorylation of cytosolic targets as well as association with other molecules, which are involved in pleiotropic effects of signal transduction. RTKs have a tripartite structure with extracellular, transmembrane, and cytoplasmic regions. This gene encodes a member of a novel subclass of RTKs and contains a distinct extracellular region encompassing a factor VIII-like domain. Alternative splicing in the 5’ UTR results in multiple transcript variants encoding the same protein. ENSG00000162733 discoidin domain receptor tyrosine kinase 2 NA
CXCL8 3576 The protein encoded by this gene is a member of the CXC chemokine family. This chemokine is one of the major mediators of the inflammatory response. This chemokine is secreted by several cell types. It functions as a chemoattractant, and is also a potent angiogenic factor. This gene is believed to play a role in the pathogenesis of bronchiolitis, a common respiratory tract disease caused by viral infection. This gene and other ten members of the CXC chemokine gene family form a chemokine gene cluster in a region mapped to chromosome 4q. ENSG00000169429 C-X-C motif chemokine ligand 8 NA
TXLNA 200081 NA ENSG00000084652 taxilin alpha NA
C4A 720 This gene encodes the acidic form of complement factor 4, part of the classical activation pathway. The protein is expressed as a single chain precursor which is proteolytically cleaved into a trimer of alpha, beta, and gamma chains prior to secretion. The trimer provides a surface for interaction between the antigen-antibody complex and other complement components. The alpha chain is cleaved to release C4 anaphylatoxin, an antimicrobial peptide and a mediator of local inflammation. Deficiency of this protein is associated with systemic lupus erythematosus and type I diabetes mellitus. This gene localizes to the major histocompatibility complex (MHC) class III region on chromosome 6. Varying haplotypes of this gene cluster exist, such that individuals may have 1, 2, or 3 copies of this gene. Two transcript variants encoding different isoforms have been found for this gene. ENSG00000244731 complement component 4A (Rodgers blood group) NA
SPARCL1 8404 NA ENSG00000152583 SPARC like 1 NA
SPINT1 6692 The protein encoded by this gene is a member of the Kunitz family of serine protease inhibitors. The protein is a potent inhibitor specific for HGF activator and is thought to be involved in the regulation of the proteolytic activation of HGF in injured tissues. Alternative splicing results in multiple variants encoding different isoforms. ENSG00000166145 serine peptidase inhibitor, Kunitz type 1 NA
SBNO2 22904 NA ENSG00000064932 strawberry notch homolog 2 (Drosophila) NA
CDKN1A 1026 This gene encodes a potent cyclin-dependent kinase inhibitor. The encoded protein binds to and inhibits the activity of cyclin-cyclin-dependent kinase2 or -cyclin-dependent kinase4 complexes, and thus functions as a regulator of cell cycle progression at G1. The expression of this gene is tightly controlled by the tumor suppressor protein p53, through which this protein mediates the p53-dependent cell cycle G1 phase arrest in response to a variety of stress stimuli. This protein can interact with proliferating cell nuclear antigen, a DNA polymerase accessory factor, and plays a regulatory role in S phase DNA replication and DNA damage repair. This protein was reported to be specifically cleaved by CASP3-like caspases, which thus leads to a dramatic activation of cyclin-dependent kinase2, and may be instrumental in the execution of apoptosis following caspase activation. Mice that lack this gene have the ability to regenerate damaged or missing tissue. Multiple alternatively spliced variants have been found for this gene. ENSG00000124762 cyclin-dependent kinase inhibitor 1A NA
IL4R 3566 This gene encodes the alpha chain of the interleukin-4 receptor, a type I transmembrane protein that can bind interleukin 4 and interleukin 13 to regulate IgE production. The encoded protein also can bind interleukin 4 to promote differentiation of Th2 cells. A soluble form of the encoded protein can be produced by proteolysis of the membrane-bound protein, and this soluble form can inhibit IL4-mediated cell proliferation and IL5 upregulation by T-cells. Allelic variations in this gene have been associated with atopy, a condition that can manifest itself as allergic rhinitis, sinusitus, asthma, or eczema. Polymorphisms in this gene are also associated with resistance to human immunodeficiency virus type-1 infection. Alternate splicing results in multiple transcript variants. ENSG00000077238 interleukin 4 receptor NA
URB1 9875 NA ENSG00000142207 URB1 ribosome biogenesis 1 homolog (S. cerevisiae) NA
REG1B 5968 This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. ENSG00000172023 regenerating family member 1 beta NA
TFCP2L1 29842 NA ENSG00000115112 transcription factor CP2-like 1 NA
CRACR2B 283229 NA ENSG00000177685 calcium release activated channel regulator 2B NA
ARMC9 80210 NA ENSG00000135931 armadillo repeat containing 9 NA
ANXA1 301 This gene encodes a membrane-localized protein that binds phospholipids. This protein inhibits phospholipase A2 and has anti-inflammatory activity. Loss of function or expression of this gene has been detected in multiple tumors. ENSG00000135046 annexin A1 NA
KRT1 3848 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. ENSG00000167768 keratin 1 NA
PRSS1 5644 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. ENSG00000204983 protease, serine 1 NA
LAMB2 3913 Laminins, a family of extracellular matrix glycoproteins, are the major noncollagenous constituent of basement membranes. They have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth and metastasis. Laminins, composed of 3 non identical chains: laminin alpha, beta and gamma (formerly A, B1, and B2, respectively), form a cruciform structure consisting of 3 short arms, each formed by a different chain, and a long arm composed of all 3 chains. Each laminin chain is a multidomain protein encoded by a distinct gene. Several isoforms of each chain have been described. Different alpha, beta and gamma chain isomers combine to give rise to different heterotrimeric laminin isoforms which are designated by Arabic numerals in the order of their discovery, i.e. alpha1beta1gamma1 heterotrimer is laminin 1. The biological functions of the different chains and trimer molecules are largely unknown, but some of the chains have been shown to differ with respect to their tissue distribution, presumably reflecting diverse functions in vivo. This gene encodes the beta chain isoform laminin, beta 2. The beta 2 chain contains the 7 structural domains typical of beta chains of laminin, including the short alpha region. However, unlike beta 1 chain, beta 2 has a more restricted tissue distribution. It is enriched in the basement membrane of muscles at the neuromuscular junctions, kidney glomerulus and vascular smooth muscle. Transgenic mice in which the beta 2 chain gene was inactivated by homologous recombination, showed defects in the maturation of neuromuscular junctions and impairment of glomerular filtration. Alternative splicing involving a non consensus 5’ splice site (gc) in the 5’ UTR of this gene has been reported. It was suggested that inefficient splicing of this first intron, which does not change the protein sequence, results in a greater abundance of the unspliced form of the transcript than the spliced form. The full-length nature of the spliced transcript is not known. ENSG00000172037 laminin subunit beta 2 NA
MARCKSL1 65108 This gene encodes a member of the myristoylated alanine-rich C-kinase substrate (MARCKS) family. Members of this family play a role in cytoskeletal regulation, protein kinase C signaling and calmodulin signaling. The encoded protein affects the formation of adherens junction. Alternative splicing results in multiple transcript variants. Pseudogenes of this gene are located on the long arm of chromosomes 6 and 10. ENSG00000175130 MARCKS like 1 NA
CPA1 1357 This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. ENSG00000091704 carboxypeptidase A1 NA
LGALS4 3960 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. The expression of this gene is restricted to small intestine, colon, and rectum, and it is underexpressed in colorectal cancer. ENSG00000171747 galectin 4 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_sqrt/gene_names_clus_",20,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

GTEx 2013 Factor analysis (sparse factors: voom counts)

lambda_out <- read.table("../sfa_outputs/GTEX2013_transpose/voom_gtex/gtex_voom_transpose_lambda.out");
f_out <- read.table("../sfa_outputs/GTEX2013_transpose/voom_gtex/gtex_voom_transpose_F.out");

gene_names <- as.vector(as.matrix(read.table("../sfa_inputs/gene_names_GTEX_V6.txt")));
gene_names <- substring(gene_names,1,15);
xli  <-  gene_names;

indices_mat <- SFA.ExtractTopFeatures(lambda_out, top_features = 100, options="min", mult.annotate = TRUE)

gene_list <- do.call(rbind, lapply(1:dim(indices_mat)[1], function(x) gene_names[indices_mat[x,]]))

Factor 1 Annotations

out <- mygene::queryMany(gene_list[1,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name summary X_id query symbol notfound
ankyrin repeat domain 1 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system. 27063 ENSG00000148677 ANKRD1 NA
ankyrin repeat domain 2 This gene encodes a protein that belongs to the muscle ankyrin repeat protein (MARP) family. A similar gene in rodents is a component of a muscle stress response pathway and plays a role in the stretch-response associated with slow muscle function. Alternative splicing results in multiple transcript variants encoding different isoforms. 26287 ENSG00000165887 ANKRD2 NA
actin, alpha 1, skeletal muscle The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects. 58 ENSG00000143632 ACTA1 NA
troponin C1, slow skeletal and cardiac type Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z. 7134 ENSG00000114854 TNNC1 NA
NA NA ENSG00000215861 ENSG00000215861 WI2-1896O14.1 NA
myoglobin This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported. 4151 ENSG00000198125 MB NA
myosin, heavy chain 7, cardiac muscle, beta Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. 4625 ENSG00000092054 MYH7 NA
CD22 molecule NA 933 ENSG00000012124 CD22 NA
NA NA ENSG00000258444 ENSG00000258444 CTD-2201G16.1 NA
dedicator of cytokinesis 8 This gene encodes a member of the DOCK180 family of guanine nucleotide exchange factors. Guanine nucleotide exchange factors interact with Rho GTPases and are components of intracellular signaling networks. Mutations in this gene result in the autosomal recessive form of the hyper-IgE syndrome. Alternatively spliced transcript variants encoding different isoforms have been described. 81704 ENSG00000107099 DOCK8 NA
myosin light chain 2 Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. 4633 ENSG00000111245 MYL2 NA
cytochrome c oxidase subunit 6A2 Cytochrome c oxidase (COX), the terminal enzyme of the mitochondrial respiratory chain, catalyzes the electron transfer from reduced cytochrome c to oxygen. It is a heteromeric complex consisting of 3 catalytic subunits encoded by mitochondrial genes and multiple structural subunits encoded by nuclear genes. The mitochondrially-encoded subunits function in electron transfer, and the nuclear-encoded subunits may be involved in the regulation and assembly of the complex. This nuclear gene encodes polypeptide 2 (heart/muscle isoform) of subunit VIa, and polypeptide 2 is present only in striated muscles. Polypeptide 1 (liver isoform) of subunit VIa is encoded by a different gene, and is found in all non-muscle tissues. These two polypeptides share 66% amino acid sequence identity. 1339 ENSG00000156885 COX6A2 NA
cysteine and glycine rich protein 3 This gene encodes a member of the CSRP family of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this protein is found in a group of proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Mutations in this gene are thought to cause heritable forms of hypertrophic cardiomyopathy (HCM) and dilated cardiomyopathy (DCM) in humans. Alternatively spliced transcript variants with different 5’ UTR, but encoding the same protein, have been found for this gene. 8048 ENSG00000129170 CSRP3 NA
ATP binding cassette subfamily B member 1 The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the MDR/TAP subfamily. Members of the MDR/TAP subfamily are involved in multidrug resistance. The protein encoded by this gene is an ATP-dependent drug efflux pump for xenobiotic compounds with broad substrate specificity. It is responsible for decreased drug accumulation in multidrug-resistant cells and often mediates the development of resistance to anticancer drugs. This protein also functions as a transporter in the blood-brain barrier. 5243 ENSG00000085563 ABCB1 NA
creatine kinase, M-type The protein encoded by this gene is a cytoplasmic enzyme involved in energy homeostasis and is an important serum marker for myocardial infarction. The encoded protein reversibly catalyzes the transfer of phosphate between ATP and various phosphogens such as creatine phosphate. It acts as a homodimer in striated muscle as well as in other tissues, and as a heterodimer with a similar brain isozyme in heart. The encoded protein is a member of the ATP:guanido phosphotransferase protein family. 1158 ENSG00000104879 CKM NA
cytokine receptor like factor 1 This gene encodes a member of the cytokine type I receptor family. The protein forms a secreted complex with cardiotrophin-like cytokine factor 1 and acts on cells expressing ciliary neurotrophic factor receptors. The complex can promote survival of neuronal cells. Mutations in this gene result in Crisponi syndrome and cold-induced sweating syndrome. 9244 ENSG00000006016 CRLF1 NA
actin, alpha, cardiac muscle 1 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). 70 ENSG00000159251 ACTC1 NA
titin-cap Sarcomere assembly is regulated by the muscle protein titin. Titin is a giant elastic protein with kinase activity that extends half the length of a sarcomere. It serves as a scaffold to which myofibrils and other muscle related proteins are attached. This gene encodes a protein found in striated and cardiac muscle that binds to the titin Z1-Z2 domains and is a substrate of titin kinase, interactions thought to be critical to sarcomere assembly. Mutations in this gene are associated with limb-girdle muscular dystrophy type 2G. 8557 ENSG00000173991 TCAP NA
Kazal type serine peptidase inhibitor domain 1 This gene encodes a secreted member of the insulin growth factor-binding protein (IGFBP) superfamily. The protein contains an insulin growth factor-binding domain in its N-terminal region, a Kazal-type serine protease inhibitor and follistatin-like domain in its central region, and an immunoglobulin-like domain in its C-terminal region. Studies of the mouse ortholog suggest that this protein may function in bone development and bone regeneration. This gene is hypomethylated and over-expressed in high-grade glioma compared to low-grade glioma, and thus the hypomethylated gene may be associated with cell proliferation and the shorter survival of patients with high-grade glioma. It is also one of numerous genes found to be deleted in a novel 5.54 Mb interstitial deletion, which is associated with multiple congenital anomalies. Alternative splicing results in multiple transcript variants. 81621 ENSG00000107821 KAZALD1 NA
ADP-ribosylhydrolase like 1 ADP-ribosylation is a reversible posttranslational modification used to regulate protein function. ADP-ribosyltransferases (see ART1; MIM 601625) transfer ADP-ribose from NAD+ to the target protein, and ADP-ribosylhydrolases, such as ADPRHL1, reverse the reaction (Glowacki et al., 2002 [PubMed 12070318]). 113622 ENSG00000153531 ADPRHL1 NA
G protein-coupled receptor 183 This gene was identified by the up-regulation of its expression upon Epstein-Barr virus infection of primary B lymphocytes. This gene is predicted to encode a G protein-coupled receptor that is most closely related to the thrombin receptor. Expression of this gene was detected in B-lymphocyte cell lines and lymphoid tissues but not in T-lymphocyte cell lines or peripheral blood T lymphocytes. The function of this gene is unknown. 1880 ENSG00000169508 GPR183 NA
whirlin This gene is thought to function in the organization and stabilization of sterocilia elongation and actin cystoskeletal assembly, based on studies of the related mouse gene. Mutations in this gene have been associated with autosomal recessive non-syndromic deafness and Usher Syndrome. Alternative splicing of this gene results in multiple transcript variants encoding different isoforms. 25861 ENSG00000095397 WHRN NA
integrin subunit beta like 1 This gene encodes a beta integrin-related protein that is a member of the EGF-like protein family. The encoded protein contains integrin-like cysteine-rich repeats. Alternative splicing results in multiple transcript variants. 9358 ENSG00000198542 ITGBL1 NA
NA NA NA ENSG00000180672 NA TRUE
bone marrow stromal cell antigen 2 Bone marrow stromal cells are involved in the growth and development of B-cells. The specific function of the protein encoded by the bone marrow stromal cell antigen 2 is undetermined; however, this protein may play a role in pre-B-cell growth and in rheumatoid arthritis. 684 ENSG00000130303 BST2 NA
myozenin 2 The protein encoded by this gene belongs to a family of sarcomeric proteins that bind to calcineurin, a phosphatase involved in calcium-dependent signal transduction in diverse cell types. These family members tether calcineurin to alpha-actinin at the z-line of the sarcomere of cardiac and skeletal muscle cells, and thus they are important for calcineurin signaling. Mutations in this gene cause cardiomyopathy familial hypertrophic type 16, a hereditary heart disorder. 51778 ENSG00000172399 MYOZ2 NA
protein tyrosine phosphatase, non-receptor type 3 The protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family. PTPs are known to be signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation. This protein contains a C-terminal PTP domain and an N-terminal domain homologous to the band 4.1 superfamily of cytoskeletal-associated proteins. P97, a cell cycle regulator involved in a variety of membrane related functions, has been shown to be a substrate of this PTP. This PTP was also found to interact with, and be regulated by adaptor protein 14-3-3 beta. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene. 5774 ENSG00000070159 PTPN3 NA
colorectal neoplasia differentially expressed (non-protein coding) NA ENSG00000245694 ENSG00000245694 CRNDE NA
ankyrin 1 Ankyrins are a family of proteins that link the integral membrane proteins to the underlying spectrin-actin cytoskeleton and play key roles in activities such as cell motility, activation, proliferation, contact and the maintenance of specialized membrane domains. Multiple isoforms of ankyrin with different affinities for various target proteins are expressed in a tissue-specific, developmentally regulated manner. Most ankyrins are typically composed of three structural domains: an amino-terminal domain containing multiple ankyrin repeats; a central region with a highly conserved spectrin binding domain; and a carboxy-terminal regulatory domain which is the least conserved and subject to variation. Ankyrin 1, the prototype of this family, was first discovered in the erythrocytes, but since has also been found in brain and muscles. Mutations in erythrocytic ankyrin 1 have been associated in approximately half of all patients with hereditary spherocytosis. Complex patterns of alternative splicing in the regulatory domain, giving rise to different isoforms of ankyrin 1 have been described. Truncated muscle-specific isoforms of ankyrin 1 resulting from usage of an alternate promoter have also been identified. 286 ENSG00000029534 ANK1 NA
nebulin related anchoring protein NA 4892 ENSG00000197893 NRAP NA
uncharacterized LOC105370792 NA 105370792 ENSG00000174171 LOC105370792 NA
NADH dehydrogenase (ubiquinone) 1 alpha subcomplex, 4-like 2 NA 56901 ENSG00000185633 NDUFA4L2 NA
Purkinje cell protein 4 like 1 NA 654790 ENSG00000248485 PCP4L1 NA
guanylate binding protein 2 This gene belongs to the guanine-binding protein (GBP) family, which includes interferon-induced proteins that can bind to guanine nucleotides (GMP, GDP and GTP). The encoded protein is a GTPase which hydrolyzes GTP, predominantly to GDP. The protein may play a role as a marker of squamous cell carcinomas. 2634 ENSG00000162645 GBP2 NA
death associated protein kinase 1 Death-associated protein kinase 1 is a positive mediator of gamma-interferon induced programmed cell death. DAPK1 encodes a structurally unique 160-kD calmodulin dependent serine-threonine kinase that carries 8 ankyrin repeats and 2 putative P-loop consensus sites. It is a tumor suppressor candidate. Alternative splicing results in multiple transcript variants. 1612 ENSG00000196730 DAPK1 NA
transient receptor potential cation channel subfamily M member 4 The protein encoded by this gene is a calcium-activated nonselective ion channel that mediates transport of monovalent cations across membranes, thereby depolarizing the membrane. The activity of the encoded protein increases with increasing intracellular calcium concentration, but this channel does not transport calcium. 54795 ENSG00000130529 TRPM4 NA
sushi domain containing 2 NA 56241 ENSG00000099994 SUSD2 NA
NA NA NA ENSG00000269640 NA TRUE
NA NA ENSG00000250654 ENSG00000250654 RP11-834C11.7 NA
phosphodiesterase 4D interacting protein The protein encoded by this gene serves to anchor phosphodiesterase 4D to the Golgi/centrosome region of the cell. Defects in this gene may be a cause of myeloproliferative disorder (MBD) associated with eosinophilia. Several transcript variants encoding different isoforms have been found for this gene. 9659 ENSG00000178104 PDE4DIP NA
NA NA ENSG00000250900 ENSG00000250900 CTC-338M12.6 NA
RAS like family 11 member B RASL11B is a member of the small GTPase protein family with a high degree of similarity to RAS (see HRAS, MIM 190020) proteins. 65997 ENSG00000128045 RASL11B NA
interleukin 17 receptor E This gene encodes a transmembrane protein that functions as the receptor for interleukin-17C. The encoded protein signals to downstream components of the mitogen activated protein kinase (MAPK) pathway. Activity of this protein is important in the immune response to bacterial pathogens. Alternatively spliced transcript variants have been described for this gene. 132014 ENSG00000163701 IL17RE NA
neuropilin 2 This gene encodes a member of the neuropilin family of receptor proteins. The encoded transmembrane protein binds to SEMA3C protein {sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3C} and SEMA3F protein {sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3F}, and interacts with vascular endothelial growth factor (VEGF). This protein may play a role in cardiovascular development, axon guidance, and tumorigenesis. Multiple transcript variants encoding distinct isoforms have been identified for this gene. 8828 ENSG00000118257 NRP2 NA
tripartite motif containing 54 The protein encoded by this gene contains a RING finger motif and is highly similar to the ring finger proteins RNF28/MURF1 and RNF29/MURF2. In vitro studies demonstrated that this protein, RNF28, and RNF29 form heterodimers, which may be important for the regulation of titin kinase and microtubule-dependent signal pathways in striated muscles. Alternatively spliced transcript variants encoding distinct isoforms have been reported. 57159 ENSG00000138100 TRIM54 NA
cytochrome c oxidase subunit 7A1 Cytochrome c oxidase (COX), the terminal component of the mitochondrial respiratory chain, catalyzes the electron transfer from reduced cytochrome c to oxygen. This component is a heteromeric complex consisting of 3 catalytic subunits encoded by mitochondrial genes and multiple structural subunits encoded by nuclear genes. The mitochondrially-encoded subunits function in electron transfer, and the nuclear-encoded subunits may function in the regulation and assembly of the complex. This nuclear gene encodes polypeptide 1 (muscle isoform) of subunit VIIa and the polypeptide 1 is present only in muscle tissues. Other polypeptides of subunit VIIa are present in both muscle and nonmuscle tissues, and are encoded by different genes. 1346 ENSG00000161281 COX7A1 NA
troponin T1, slow skeletal type This gene encodes a protein that is a subunit of troponin, which is a regulatory complex located on the thin filament of the sarcomere. This complex regulates striated muscle contraction in response to fluctuations in intracellular calcium concentration. This complex is composed of three subunits: troponin C, which binds calcium, troponin T, which binds tropomyosin, and troponin I, which is an inhibitory subunit. This protein is the slow skeletal troponin T subunit. Mutations in this gene cause nemaline myopathy type 5, also known as Amish nemaline myopathy, a neuromuscular disorder characterized by muscle weakness and rod-shaped, or nemaline, inclusions in skeletal muscle fibers which affects infants, resulting in death due to respiratory insufficiency, usually in the second year. Multiple transcript variants encoding different isoforms have been found for this gene. 7138 ENSG00000105048 TNNT1 NA
regulator of calcineurin 2 This gene encodes a member of the regulator of calcineurin (RCAN) protein family. These proteins play a role in many physiological processes by binding to the catalytic domain of calcineurin A, inhibiting calcineurin-mediated nuclear translocation of the transcription factor NFATC1. Expression of this gene in skin fibroblasts is upregulated by thyroid hormone, and the encoded protein may also play a role in endothelial cell function and angiogenesis. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 10231 ENSG00000172348 RCAN2 NA
tripartite motif containing 7 The protein encoded by this gene is a member of the tripartite motif (TRIM) family. The TRIM motif includes three zinc-binding domains, a RING, a B-box type 1, a B-box type 2, and a coiled-coil region. The protein localizes to both the nucleus and the cytoplasm, and may represent a participant in the initiation of glycogen synthesis. Alternative splicing results in multiple transcript variants. 81786 ENSG00000146054 TRIM7 NA
crystallin alpha B Mammalian lens crystallins are divided into alpha, beta, and gamma families. Alpha crystallins are composed of two gene products: alpha-A and alpha-B, for acidic and basic, respectively. Alpha crystallins can be induced by heat shock and are members of the small heat shock protein (HSP20) family. They act as molecular chaperones although they do not renature proteins and release them in the fashion of a true chaperone; instead they hold them in large soluble aggregates. Post-translational modifications decrease the ability to chaperone. These heterogeneous aggregates consist of 30-40 subunits; the alpha-A and alpha-B subunits have a 3:1 ratio, respectively. Two additional functions of alpha crystallins are an autokinase activity and participation in the intracellular architecture. The encoded protein has been identified as a moonlighting protein based on its ability to perform mechanistically distinct functions. Alpha-A and alpha-B gene products are differentially expressed; alpha-A is preferentially restricted to the lens and alpha-B is expressed widely in many tissues and organs. Elevated expression of alpha-B crystallin occurs in many neurological diseases; a missense mutation cosegregated in a family with a desmin-related myopathy. Alternative splicing results in multiple transcript variants. 1410 ENSG00000109846 CRYAB NA
pleckstrin NA 5341 ENSG00000115956 PLEK NA
Ras association domain family member 2 This gene encodes a protein that contains a Ras association domain. Similar to its cattle and sheep counterparts, this gene is located near the prion gene. Two alternatively spliced transcripts encoding the same isoform have been reported. 9770 ENSG00000101265 RASSF2 NA
latent transforming growth factor beta binding protein 2 The protein encoded by this gene belongs to the family of latent transforming growth factor (TGF)-beta binding proteins (LTBP), which are extracellular matrix proteins with multi-domain structure. This protein is the largest member of the LTBP family possessing unique regions and with most similarity to the fibrillins. It has thus been suggested that it may have multiple functions: as a member of the TGF-beta latent complex, as a structural component of microfibrils, and a role in cell adhesion. 4053 ENSG00000119681 LTBP2 NA
heat shock protein family A (Hsp70) member 7 NA ENSG00000225217 ENSG00000225217 HSPA7 NA
fatty acid binding protein 3 The intracellular fatty acid-binding proteins (FABPs) belongs to a multigene family. FABPs are divided into at least three distinct types, namely the hepatic-, intestinal- and cardiac-type. They form 14-15 kDa proteins and are thought to participate in the uptake, intracellular metabolism and/or transport of long-chain fatty acids. They may also be responsible in the modulation of cell growth and proliferation. Fatty acid-binding protein 3 gene contains four exons and its function is to arrest growth of mammary epithelial cells. This gene is a candidate tumor suppressor gene for human breast cancer. Alternative splicing results in multiple transcript variants. 2170 ENSG00000121769 FABP3 NA
phosphatidylethanolamine binding protein 4 The phosphatidylethanolamine (PE)-binding proteins, including PEBP4, are an evolutionarily conserved family of proteins with pivotal biologic functions, such as lipid binding and inhibition of serine proteases (Wang et al., 2004 [PubMed 15302887]). 157310 ENSG00000134020 PEBP4 NA
tubulin beta 4A class IVa This gene encodes a member of the beta tubulin family. Beta tubulins are one of two core protein families (alpha and beta tubulins) that heterodimerize and assemble to form microtubules. Mutations in this gene cause hypomyelinating leukodystrophy-6 and autosomal dominant torsion dystonia-4. Alternate splicing results in multiple transcript variants encoding different isoforms. A pseudogene of this gene is found on chromosome X. 10382 ENSG00000104833 TUBB4A NA
G protein-coupled receptor 176 Members of the G protein-coupled receptor family, such as GPR176, are cell surface receptors involved in responses to hormones, growth factors, and neurotransmitters (Hata et al., 1995 [PubMed 7893747]). 11245 ENSG00000166073 GPR176 NA
dickkopf WNT signaling pathway inhibitor 3 This gene encodes a protein that is a member of the dickkopf family. The secreted protein contains two cysteine rich regions and is involved in embryonic development through its interactions with the Wnt signaling pathway. The expression of this gene is decreased in a variety of cancer cell lines and it may function as a tumor suppressor gene. Alternative splicing results in multiple transcript variants encoding the same protein. 27122 ENSG00000050165 DKK3 NA
transmembrane protein 182 NA 130827 ENSG00000170417 TMEM182 NA
tumor necrosis factor receptor superfamily member 12A NA 51330 ENSG00000006327 TNFRSF12A NA
pleckstrin and Sec7 domain containing This gene encodes a Plekstrin homology and SEC7 domains-containing protein that functions as a guanine nucleotide exchange factor. The encoded protein regulates signal transduction by activating ADP-ribosylation factor 6. Alternative splicing results in multiple transcript variants. 5662 ENSG00000059915 PSD NA
integrin subunit alpha M This gene encodes the integrin alpha M chain. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain. This I-domain containing alpha integrin combines with the beta 2 chain (ITGB2) to form a leukocyte-specific integrin referred to as macrophage receptor 1 (‘Mac-1’), or inactivated-C3b (iC3b) receptor 3 (‘CR3’). The alpha M beta 2 integrin is important in the adherence of neutrophils and monocytes to stimulated endothelium, and also in the phagocytosis of complement coated particles. Multiple transcript variants encoding different isoforms have been found for this gene. 3684 ENSG00000169896 ITGAM NA
regulator of G-protein signaling 1 This gene encodes a member of the regulator of G-protein signalling family. This protein is located on the cytosolic side of the plasma membrane and contains a conserved, 120 amino acid motif called the RGS domain. The protein attenuates the signalling activity of G-proteins by binding to activated, GTP-bound G alpha subunits and acting as a GTPase activating protein (GAP), increasing the rate of conversion of the GTP to GDP. This hydrolysis allows the G alpha subunits to bind G beta/gamma subunit heterodimers, forming inactive G-protein heterotrimers, thereby terminating the signal. 5996 ENSG00000090104 RGS1 NA
regulator of G-protein signaling 9 This gene encodes a member of the RGS family of GTPase activating proteins that function in various signaling pathways by accelerating the deactivation of G proteins. This protein is anchored to photoreceptor membranes in retinal cells and deactivates G proteins in the rod and cone phototransduction cascades. Mutations in this gene result in bradyopsia. Multiple transcript variants encoding different isoforms have been found for this gene. 8787 ENSG00000108370 RGS9 NA
NA NA ENSG00000272463 ENSG00000272463 RP11-532F6.3 NA
LIM and cysteine rich domains 1 This gene encodes a member of the LIM-domain family of zinc finger proteins. The encoded protein contains an N-terminal cysteine-rich domain and two C-terminal LIM domains. The presence of LIM domains suggests involvement in protein-protein interactions. The protein may act as a co-regulator of transcription along with other transcription factors. Alternate splicing results in multiple transcript variants of this gene. 29995 ENSG00000071282 LMCD1 NA
growth arrest specific 6 This gene encodes a gamma-carboxyglutamic acid (Gla)-containing protein thought to be involved in the stimulation of cell proliferation. This gene is frequently overexpressed in many cancers and has been implicated as an adverse prognostic marker. Elevated protein levels are additionally associated with a variety of disease states, including venous thromboembolic disease, systemic lupus erythematosus, chronic renal failure, and preeclampsia. 2621 ENSG00000183087 GAS6 NA
uncharacterized LOC100507002 NA 100507002 ENSG00000263470 LOC100507002 NA
myosin, heavy chain 6, cardiac muscle, alpha Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. 4624 ENSG00000197616 MYH6 NA
CD53 molecule The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. This encoded protein is a cell surface glycoprotein that is known to complex with integrins. It contributes to the transduction of CD2-generated signals in T cells and natural killer cells and has been suggested to play a role in growth regulation. Familial deficiency of this gene has been linked to an immunodeficiency associated with recurrent infectious diseases caused by bacteria, fungi and viruses. Alternative splicing results in multiple transcript variants. 963 ENSG00000143119 CD53 NA
ATPase Na+/K+ transporting subunit beta 2 The protein encoded by this gene belongs to the family of Na+/K+ and H+/K+ ATPases beta chain proteins, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The beta subunit regulates, through assembly of alpha/beta heterodimers, the number of sodium pumps transported to the plasma membrane. The glycoprotein subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes a beta 2 subunit. Two transcript variants encoding different isoforms have been found for this gene. 482 ENSG00000129244 ATP1B2 NA
creatine kinase, mitochondrial 2 Mitochondrial creatine kinase (MtCK) is responsible for the transfer of high energy phosphate from mitochondria to the cytosolic carrier, creatine. It belongs to the creatine kinase isoenzyme family. It exists as two isoenzymes, sarcomeric MtCK and ubiquitous MtCK, encoded by separate genes. Mitochondrial creatine kinase occurs in two different oligomeric forms: dimers and octamers, in contrast to the exclusively dimeric cytosolic creatine kinase isoenzymes. Sarcomeric mitochondrial creatine kinase has 80% homology with the coding exons of ubiquitous mitochondrial creatine kinase. This gene contains sequences homologous to several motifs that are shared among some nuclear genes encoding mitochondrial proteins and thus may be essential for the coordinated activation of these genes during mitochondrial biogenesis. Three transcript variants encoding the same protein have been found for this gene. 1160 ENSG00000131730 CKMT2 NA
REST corepressor 2 NA 283248 ENSG00000167771 RCOR2 NA
NA NA ENSG00000225792 ENSG00000225792 AC004540.4 NA
titin This gene encodes a large abundant protein of striated muscle. The product of this gene is divided into two regions, a N-terminal I-band and a C-terminal A-band. The I-band, which is the elastic part of the molecule, contains two regions of tandem immunoglobulin domains on either side of a PEVK region that is rich in proline, glutamate, valine and lysine. The A-band, which is thought to act as a protein-ruler, contains a mixture of immunoglobulin and fibronectin repeats, and possesses kinase activity. An N-terminal Z-disc region and a C-terminal M-line region bind to the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. Titin also contains binding sites for muscle associated proteins so it serves as an adhesion template for the assembly of contractile machinery in muscle cells. It has also been identified as a structural protein for chromosomes. Alternative splicing of this gene results in multiple transcript variants. Considerable variability exists in the I-band, the M-line and the Z-disc regions of titin. Variability in the I-band region contributes to the differences in elasticity of different titin isoforms and, therefore, to the differences in elasticity of different muscle types. Mutations in this gene are associated with familial hypertrophic cardiomyopathy 9, and autoantibodies to titin are produced in patients with the autoimmune disease scleroderma. 7273 ENSG00000155657 TTN NA
NA NA NA ENSG00000272003 NA TRUE
NA NA ENSG00000254539 ENSG00000254539 RP4-791M13.3 NA
calcium/calmodulin dependent protein kinase II beta The product of this gene belongs to the serine/threonine protein kinase family and to the Ca(2+)/calmodulin-dependent protein kinase subfamily. Calcium signaling is crucial for several aspects of plasticity at glutamatergic synapses. In mammalian cells, the enzyme is composed of four different chains: alpha, beta, gamma, and delta. The product of this gene is a beta chain. It is possible that distinct isoforms of this chain have different cellular localizations and interact differently with calmodulin. Alternative splicing results in multiple transcript variants. 816 ENSG00000058404 CAMK2B NA
carbohydrate (N-acetylgalactosamine 4-sulfate 6-O) sulfotransferase 15 Chondroitin sulfate (CS) is a glycosaminoglycan which is an important structural component of the extracellular matrix and which links to proteins to form proteoglycans. Chondroitin sulfate E (CS-E) is an isomer of chondroitin sulfate in which the C-4 and C-6 hydroxyl groups are sulfated. This gene encodes a type II transmembrane glycoprotein that acts as a sulfotransferase to transfer sulfate to the C-6 hydroxal group of chondroitin sulfate. This gene has also been identified as being co-expressed with RAG1 in B-cells and as potentially acting as a B-cell surface signaling receptor. Alternative splicing results in multiple transcript variants encoding distinct isoforms. 51363 ENSG00000182022 CHST15 NA
PTPRF interacting protein alpha 4 PPFIA4, or liprin-alpha-4, belongs to the liprin-alpha gene family. See liprin-alpha-1 (LIP1, or PPFIA1; MIM 611054) for background on liprins. 8497 ENSG00000143847 PPFIA4 NA
lymphocyte cytosolic protein 1 Plastins are a family of actin-binding proteins that are conserved throughout eukaryote evolution and expressed in most tissues of higher eukaryotes. In humans, two ubiquitous plastin isoforms (L and T) have been identified. Plastin 1 (otherwise known as Fimbrin) is a third distinct plastin isoform which is specifically expressed at high levels in the small intestine. The L isoform is expressed only in hemopoietic cell lineages, while the T isoform has been found in all other normal cells of solid tissues that have replicative potential (fibroblasts, endothelial cells, epithelial cells, melanocytes, etc.). However, L-plastin has been found in many types of malignant human cells of non-hemopoietic origin suggesting that its expression is induced accompanying tumorigenesis in solid tissues. 3936 ENSG00000136167 LCP1 NA
ephrin A5 Ephrin-A5, a member of the ephrin gene family, prevents axon bundling in cocultures of cortical neurons with astrocytes, a model of late stage nervous system development and differentiation. The EPH and EPH-related receptors comprise the largest subfamily of receptor protein-tyrosine kinases and have been implicated in mediating developmental events, particularly in the nervous system. EPH receptors typically have a single kinase domain and an extracellular region containing a Cys-rich domain and 2 fibronectin type III repeats. The ephrin ligands and receptors have been named by the Eph Nomenclature Committee (1997). Based on their structures and sequence relationships, ephrins are divided into the ephrin-A (EFNA) class, which are anchored to the membrane by a glycosylphosphatidylinositol linkage, and the ephrin-B (EFNB) class, which are transmembrane proteins. The Eph family of receptors are similarly divided into 2 groups based on the similarity of their extracellular domain sequences and their affinities for binding ephrin-A and ephrin-B ligands. 1946 ENSG00000184349 EFNA5 NA
glypican 1 Cell surface heparan sulfate proteoglycans are composed of a membrane-associated protein core substituted with a variable number of heparan sulfate chains. Members of the glypican-related integral membrane proteoglycan family (GRIPS) contain a core protein anchored to the cytoplasmic membrane via a glycosyl phosphatidylinositol linkage. These proteins may play a role in the control of cell division and growth regulation. 2817 ENSG00000063660 GPC1 NA
cortexin 1 NA 404217 ENSG00000178531 CTXN1 NA
FXYD domain containing ion transport regulator 6 This gene encodes a member of the FXYD family of transmembrane proteins. This particular protein encodes phosphohippolin, which likely affects the activity of Na,K-ATPase. Multiple alternatively spliced transcript variants encoding the same protein have been described. Related pseudogenes have been identified on chromosomes 10 and X. Read-through transcripts have been observed between this locus and the downstream sodium/potassium-transporting ATPase subunit gamma (FXYD2, GeneID 486) locus. 53826 ENSG00000137726 FXYD6 NA
lysyl oxidase This gene encodes a member of the lysyl oxidase family of proteins. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate a regulatory propeptide and the mature enzyme. The copper-dependent amine oxidase activity of this enzyme functions in the crosslinking of collagens and elastin, while the propeptide may play a role in tumor suppression. 4015 ENSG00000113083 LOX NA
inositol polyphosphate-5-phosphatase J NA 27124 ENSG00000185133 INPP5J NA
prolyl 3-hydroxylase 3 The protein encoded by this gene belongs to the leprecan family of proteoglycans, which function as collagen prolyl hydroxylases that are required for proper collagen biosynthesis, folding and assembly. This protein, like other family members, is thought to reside in the endoplasmic reticulum. Epigenetic inactivation of this gene is associated with breast and other cancers, suggesting that it may function as a tumor suppressor. 10536 ENSG00000110811 P3H3 NA
cystatin E/M The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and the kininogens. The type 2 cystatin proteins are a class of cysteine proteinase inhibitors found in a variety of human fluids and secretions, where they appear to provide protective functions. This gene encodes a cystatin from the type 2 family, which is down-regulated in metastatic breast tumor cells as compared to primary tumor cells. Loss of expression is likely associated with the progression of a primary tumor to a metastatic phenotype. 1474 ENSG00000175315 CST6 NA
phospholipase C gamma 2 The protein encoded by this gene is a transmembrane signaling enzyme that catalyzes the conversion of 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate to 1D-myo-inositol 1,4,5-trisphosphate (IP3) and diacylglycerol (DAG) using calcium as a cofactor. IP3 and DAG are second messenger molecules important for transmitting signals from growth factor receptors and immune system receptors across the cell membrane. Mutations in this gene have been found in autoinflammation, antibody deficiency, and immune dysregulation syndrome and familial cold autoinflammatory syndrome 3. 5336 ENSG00000197943 PLCG2 NA
cAMP responsive element binding protein 3 like 1 The protein encoded by this gene is normally found in the membrane of the endoplasmic reticulum (ER). However, upon stress to the ER, the encoded protein is cleaved and the released cytoplasmic transcription factor domain translocates to the nucleus. There it activates the transcription of target genes by binding to box-B elements. 90993 ENSG00000157613 CREB3L1 NA
RNA, 5S ribosomal pseudogene 352 NA ENSG00000200278 ENSG00000200278 RNA5SP352 NA
cysteine rich protein 2 This gene encodes a putative transcription factor with two LIM zinc-binding domains. The encoded protein may participate in the differentiation of smooth muscle tissue. Alternative splicing results in multiple transcript variants. 1397 ENSG00000182809 CRIP2 NA
NA NA NA ENSG00000203691 NA TRUE
kelch like family member 5 NA 51088 ENSG00000109790 KLHL5 NA
myosin light chain 3 MYL3 encodes myosin light chain 3, an alkali light chain also referred to in the literature as both the ventricular isoform and the slow skeletal muscle isoform. Mutations in MYL3 have been identified as a cause of mid-left ventricular chamber type hypertrophic cardiomyopathy. 4634 ENSG00000160808 MYL3 NA
filamin binding LIM protein 1 This gene encodes a protein with an N-terminal filamin-binding domain, a central proline-rich domain, and, multiple C-terminal LIM domains. This protein localizes at cell junctions and may link cell adhesion structures to the actin cytoskeleton. This protein may be involved in the assembly and stabilization of actin-filaments and likely plays a role in modulating cell adhesion, cell morphology and cell motility. This protein also localizes to the nucleus and may affect cardiomyocyte differentiation after binding with the CSX/NKX2-5 transcription factor. Alternative splicing results in multiple transcript variants encoding different isoforms. 54751 ENSG00000162458 FBLIM1 NA
collagen type VII alpha 1 This gene encodes the alpha chain of type VII collagen. The type VII collagen fibril, composed of three identical alpha collagen chains, is restricted to the basement zone beneath stratified squamous epithelia. It functions as an anchoring fibril between the external epithelia and the underlying stroma. Mutations in this gene are associated with all forms of dystrophic epidermolysis bullosa. In the absence of mutations, however, an acquired form of this disease can result from an autoimmune response made to type VII collagen. 1294 ENSG00000114270 COL7A1 NA
neutrophil cytosolic factor 4 The protein encoded by this gene is a cytosolic regulatory component of the superoxide-producing phagocyte NADPH-oxidase, a multicomponent enzyme system important for host defense. This protein is preferentially expressed in cells of myeloid lineage. It interacts primarily with neutrophil cytosolic factor 2 (NCF2/p67-phox) to form a complex with neutrophil cytosolic factor 1 (NCF1/p47-phox), which further interacts with the small G protein RAC1 and translocates to the membrane upon cell stimulation. This complex then activates flavocytochrome b, the membrane-integrated catalytic core of the enzyme system. The PX domain of this protein can bind phospholipid products of the PI(3) kinase, which suggests its role in PI(3) kinase-mediated signaling events. The phosphorylation of this protein was found to negatively regulate the enzyme activity. Alternatively spliced transcript variants encoding distinct isoforms have been observed. 4689 ENSG00000100365 NCF4 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",1,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 2 Annotations

out <- mygene::queryMany(gene_list[2,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id summary name symbol query notfound
4311 This gene encodes a common acute lymphocytic leukemia antigen that is an important cell surface marker in the diagnosis of human acute lymphocytic leukemia (ALL). This protein is present on leukemic cells of pre-B phenotype, which represent 85% of cases of ALL. This protein is not restricted to leukemic cells, however, and is found on a variety of normal tissues. It is a glycoprotein that is particularly abundant in kidney, where it is present on the brush border of proximal tubules and on glomerular epithelium. The protein is a neutral endopeptidase that cleaves peptides at the amino side of hydrophobic residues and inactivates several peptide hormones including glucagon, enkephalins, substance P, neurotensin, oxytocin, and bradykinin. This gene, which encodes a 100-kD type II transmembrane glycoprotein, exists in a single copy of greater than 45 kb. The 5’ untranslated region of this gene is alternatively spliced, resulting in four separate mRNA transcripts. The coding region is not affected by alternative splicing. membrane metallo-endopeptidase MME ENSG00000196549 NA
4897 Cell adhesion molecules (CAMs) are members of the immunoglobulin superfamily. This gene encodes a neuronal cell adhesion molecule with multiple immunoglobulin-like C2-type domains and fibronectin type-III domains. This ankyrin-binding protein is involved in neuron-neuron adhesion and promotes directional signaling during axonal cone growth. This gene is also expressed in non-neural tissues and may play a general role in cell-cell communication via signaling from its intracellular domain to the actin cytoskeleton during directional cell migration. Allelic variants of this gene have been associated with autism and addiction vulnerability. Alternative splicing results in multiple transcript variants encoding different isoforms. neuronal cell adhesion molecule NRCAM ENSG00000091129 NA
2938 This gene encodes a member of a family of enzymes that function to add glutathione to target electrophilic compounds, including carcinogens, therapeutic drugs, environmental toxins, and products of oxidative stress. This action is an important step in detoxification of these compounds. This subfamily of enzymes has a particular role in protecting cells from reactive oxygen species and the products of peroxidation. Polymorphisms in this gene influence the ability of individuals to metabolize different drugs. This gene is located in a cluster of similar genes and pseudogenes on chromosome 6. Alternative splicing results in multiple transcript variants. glutathione S-transferase alpha 1 GSTA1 ENSG00000243955 NA
2243 This gene encodes the alpha subunit of the coagulation factor fibrinogen, which is a component of the blood clot. Following vascular injury, the encoded preproprotein is proteolytically processed by thrombin during the conversion of fibrinogen to fibrin. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia, afibrinogenemia and renal amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. fibrinogen alpha chain FGA ENSG00000171560 NA
259 This gene encodes a complex glycoprotein secreted in plasma. The precursor is proteolytically processed into distinct functioning proteins: alpha-1-microglobulin, which belongs to the superfamily of lipocalin transport proteins and may play a role in the regulation of inflammatory processes, and bikunin, which is a urinary trypsin inhibitor belonging to the superfamily of Kunitz-type protease inhibitors and plays an important role in many physiological and pathological processes. This gene is located on chromosome 9 in a cluster of lipocalin genes. alpha-1-microglobulin/bikunin precursor AMBP ENSG00000106927 NA
2244 The protein encoded by this gene is the beta component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including afibrinogenemia, dysfibrinogenemia, hypodysfibrinogenemia and thrombotic tendency. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. fibrinogen beta chain FGB ENSG00000171564 NA
ENSG00000254902 NA ANO1 antisense RNA 1 ANO1-AS1 ENSG00000254902 NA
10912 This gene is a member of a group of genes whose transcript levels are increased following stressful growth arrest conditions and treatment with DNA-damaging agents. The protein encoded by this gene responds to environmental stresses by mediating activation of the p38/JNK pathway via MTK1/MEKK4 kinase. The GADD45G is highly expressed in placenta. growth arrest and DNA damage inducible gamma GADD45G ENSG00000130222 NA
8470 Arg and c-Abl represent the mammalian members of the Abelson family of non-receptor protein-tyrosine kinases. They interact with the Arg/Abl binding proteins via the SH3 domains present in the carboxy end of the latter group of proteins. This gene encodes the sorbin and SH3 domain containing 2 protein. It has three C-terminal SH3 domains and an N-terminal sorbin homology (SoHo) domain that interacts with lipid raft proteins. The subcellular localization of this protein in epithelial and cardiac muscle cells suggests that it functions as an adapter protein to assemble signaling complexes in stress fibers, and that it is a potential link between Abl family kinases and the actin cytoskeleton. Alternative splicing results in multiple transcript variants encoding different isoforms. sorbin and SH3 domain containing 2 SORBS2 ENSG00000154556 NA
213 Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. albumin ALB ENSG00000163631 NA
78989 This gene encodes a member of the collectin family of C-type lectins that possess collagen-like sequences and carbohydrate recognition domains. Collectins are secreted proteins that play important roles in the innate immune system by binding to carbohydrate antigens on microorganisms, facilitating their recognition and removal. The encoded protein binds to multiple sugars with a preference for fucose and mannose. Mutations in this gene are a cause of 3MC syndrome-2. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. collectin subfamily member 11 COLEC11 ENSG00000118004 NA
ENSG00000225670 NA CADM3 antisense RNA 1 CADM3-AS1 ENSG00000225670 NA
7448 The protein encoded by this gene is a member of the pexin family. It is found in serum and tissues and promotes cell adhesion and spreading, inhibits the membrane-damaging effect of the terminal cytolytic complement pathway, and binds to several serpin serine protease inhibitors. It is a secreted protein and exists in either a single chain form or a clipped, two chain form held together by a disulfide bond. vitronectin VTN ENSG00000109072 NA
350 Apolipoprotein H has been implicated in a variety of physiologic pathways including lipoprotein metabolism, coagulation, and the production of antiphospholipid autoantibodies. APOH may be a required cofactor for anionic phospholipid binding by the antiphospholipid autoantibodies found in sera of many patients with lupus and primary antiphospholipid syndrome, but it does not seem to be required for the reactivity of antiphospholipid autoantibodies associated with infections. apolipoprotein H APOH ENSG00000091583 NA
23498 3-Hydroxyanthranilate 3,4-dioxygenase is a monomeric cytosolic protein belonging to the family of intramolecular dioxygenases containing nonheme ferrous iron. It is widely distributed in peripheral organs, such as liver and kidney, and is also present in low amounts in the central nervous system. HAAO catalyzes the synthesis of quinolinic acid (QUIN) from 3-hydroxyanthranilic acid. QUIN is an excitotoxin whose toxicity is mediated by its ability to activate glutamate N-methyl-D-aspartate receptors. Increased cerebral levels of QUIN may participate in the pathogenesis of neurologic and inflammatory disorders. HAAO has been suggested to play a role in disorders associated with altered tissue levels of QUIN. 3-hydroxyanthranilate 3,4-dioxygenase HAAO ENSG00000162882 NA
100873993 NA ITIH4 antisense RNA 1 ITIH4-AS1 ENSG00000239799 NA
57863 IGSF4B is a brain-specific protein related to the calcium-independent cell-cell adhesion molecules known as nectins (see PVRL3; MIM 607147) (Kakunaga et al., 2005 [PubMed 15741237]). cell adhesion molecule 3 CADM3 ENSG00000162706 NA
1571 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and is induced by ethanol, the diabetic state, and starvation. The enzyme metabolizes both endogenous substrates, such as ethanol, acetone, and acetal, as well as exogenous substrates including benzene, carbon tetrachloride, ethylene glycol, and nitrosamines which are premutagens found in cigarette smoke. Due to its many substrates, this enzyme may be involved in such varied processes as gluconeogenesis, hepatic cirrhosis, diabetes, and cancer. cytochrome P450 family 2 subfamily E member 1 CYP2E1 ENSG00000130649 NA
9435 This locus encodes a sulfotransferase protein. The encoded enzyme catalyzes the sulfation of a nonreducing N-acetylglucosamine residue, and may play a role in biosynthesis of 6-sulfosialyl Lewis X antigen. carbohydrate sulfotransferase 2 CHST2 ENSG00000175040 NA
5507 This gene encodes a regulatory subunit of protein phosphatase-1 (PP1). PP1 catalyzes reversible protein phosphorylation, which is important in a wide range of cellular activities: neuronal, muscular, RNA splicing, protein synthesis, cell death, and glycogen metabolism, to name just a few. By interacting with different regulatory subunits, PP1 is directed to different parts of the cell, to different substrates, or to respond to extracellular signals. protein phosphatase 1 regulatory subunit 3C PPP1R3C ENSG00000119938 NA
2266 The protein encoded by this gene is the gamma component of fibrinogen, a blood-borne glycoprotein comprised of three pairs of nonidentical polypeptide chains. Following vascular injury, fibrinogen is cleaved by thrombin to form fibrin which is the most abundant component of blood clots. In addition, various cleavage products of fibrinogen and fibrin regulate cell adhesion and spreading, display vasoconstrictor and chemotactic activities, and are mitogens for several cell types. Mutations in this gene lead to several disorders, including dysfibrinogenemia, hypofibrinogenemia and thrombophilia. Alternative splicing results in transcript variants encoding different isoforms. fibrinogen gamma chain FGG ENSG00000171557 NA
335 This gene encodes apolipoprotein A-I, which is the major protein component of high density lipoprotein (HDL) in plasma. The encoded preproprotein is proteolytically processed to generate the mature protein, which promotes cholesterol efflux from tissues to the liver for excretion, and is a cofactor for lecithin cholesterolacyltransferase (LCAT), an enzyme responsible for the formation of most plasma cholesteryl esters. This gene is closely linked with two other apolipoprotein genes on chromosome 11. Defects in this gene are associated with HDL deficiencies, including Tangier disease, and with systemic non-neuropathic amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein. apolipoprotein A1 APOA1 ENSG00000118137 NA
81035 This gene encodes a member of the C-lectin family, proteins that possess collagen-like sequences and carbohydrate recognition domains. This protein is a scavenger receptor, a cell surface glycoprotein that displays several functions associated with host defense. It can bind to carbohydrate antigens on microorganisms, facilitating their recognition and removal. It also mediates the recognition, internalization, and degradation of oxidatively modified low density lipoprotein by vascular endothelial cells. collectin subfamily member 12 COLEC12 ENSG00000158270 NA
ENSG00000271857 NA NA RP1-244F24.1 ENSG00000271857 NA
148534 NA transmembrane protein 56 TMEM56 ENSG00000152078 NA
7439 This gene encodes a member of the bestrophin gene family. This small gene family is characterized by proteins with a highly conserved N-terminus with four to six transmembrane domains. Bestrophins may form chloride ion channels or may regulate voltage-gated L-type calcium-ion channels. Bestrophins are generally believed to form calcium-activated chloride-ion channels in epithelial cells but they have also been shown to be highly permeable to bicarbonate ion transport in retinal tissue. Mutations in this gene are responsible for juvenile-onset vitelliform macular dystrophy (VMD2), also known as Best macular dystrophy, in addition to adult-onset vitelliform macular dystrophy (AVMD) and other retinopathies. Alternative splicing results in multiple variants encoding distinct isoforms. bestrophin 1 BEST1 ENSG00000167995 NA
ENSG00000232815 NA double homeobox 4 like 50, pseudogene DUX4L50 ENSG00000232815 NA
10809 NA StAR related lipid transfer domain containing 10 STARD10 ENSG00000214530 NA
10098 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. tetraspanin 5 TSPAN5 ENSG00000168785 NA
345 Apolipoprotein C-III is a very low density lipoprotein (VLDL) protein. APOC3 inhibits lipoprotein lipase and hepatic lipase; it is thought to delay catabolism of triglyceride-rich particles. The APOA1, APOC3 and APOA4 genes are closely linked in both rat and human genomes. The A-I and A-IV genes are transcribed from the same strand, while the A-1 and C-III genes are convergently transcribed. An increase in apoC-III levels induces the development of hypertriglyceridemia. apolipoprotein C3 APOC3 ENSG00000110245 NA
229 Fructose-1,6-bisphosphate aldolase (EC 4.1.2.13) is a tetrameric glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Vertebrates have 3 aldolase isozymes which are distinguished by their electrophoretic and catalytic properties. Differences indicate that aldolases A, B, and C are distinct proteins, the products of a family of related ‘housekeeping’ genes exhibiting developmentally regulated expression of the different isozymes. The developing embryo produces aldolase A, which is produced in even greater amounts in adult muscle where it can be as much as 5% of total cellular protein. In adult liver, kidney and intestine, aldolase A expression is repressed and aldolase B is produced. In brain and other nervous tissue, aldolase A and C are expressed about equally. There is a high degree of homology between aldolase A and C. Defects in ALDOB cause hereditary fructose intolerance. aldolase, fructose-bisphosphate B ALDOB ENSG00000136872 NA
3242 The protein encoded by this gene is an enzyme in the catabolic pathway of tyrosine. The encoded protein catalyzes the conversion of 4-hydroxyphenylpyruvate to homogentisate. Defects in this gene are a cause of tyrosinemia type 3 (TYRO3) and hawkinsinuria (HAWK). Two transcript variants encoding different isoforms have been found for this gene. 4-hydroxyphenylpyruvate dioxygenase HPD ENSG00000158104 NA
104326055 NA APOA1 antisense RNA APOA1-AS ENSG00000235910 NA
57168 NA aspartate beta-hydroxylase domain containing 2 ASPHD2 ENSG00000128203 NA
3699 This gene encodes the heavy chain subunit of the pre-alpha-trypsin inhibitor complex. This complex may stabilize the extracellular matrix through its ability to bind hyaluronic acid. Polymorphisms of this gene may be associated with increased risk for schizophrenia and major depressive disorder. This gene is present in an inter-alpha-trypsin inhibitor family gene cluster on chromosome 3. inter-alpha-trypsin inhibitor heavy chain 3 ITIH3 ENSG00000162267 NA
80714 This gene encodes a member of the pre-B cell leukemia transcription factor family. These proteins are homeobox proteins that play critical roles in embryonic development and cellular differentiation both as Hox cofactors and through Hox-independent pathways. The encoded protein contains a homeobox DNA-binding domain, but specific functions of the protein have not been determined. Alternatively spliced transcript variants have been observed for this gene. PBX homeobox 4 PBX4 ENSG00000105717 NA
338773 NA transmembrane protein 119 TMEM119 ENSG00000183160 NA
5376 This gene encodes an integral membrane protein that is a major component of myelin in the peripheral nervous system. Studies suggest two alternately used promoters drive tissue-specific expression. Various mutations of this gene are causes of Charcot-Marie-Tooth disease Type IA, Dejerine-Sottas syndrome, and hereditary neuropathy with liability to pressure palsies. Alternative splicing results in multiple transcript variants. peripheral myelin protein 22 PMP22 ENSG00000109099 NA
183 The protein encoded by this gene, pre-angiotensinogen or angiotensinogen precursor, is expressed in the liver and is cleaved by the enzyme renin in response to lowered blood pressure. The resulting product, angiotensin I, is then cleaved by angiotensin converting enzyme (ACE) to generate the physiologically active enzyme angiotensin II. The protein is involved in maintaining blood pressure and in the pathogenesis of essential hypertension and preeclampsia. Mutations in this gene are associated with susceptibility to essential hypertension, and can cause renal tubular dysgenesis, a severe disorder of renal tubular development. Defects in this gene have also been associated with non-familial structural atrial fibrillation, and inflammatory bowel disease. angiotensinogen AGT ENSG00000135744 NA
1558 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and its expression is induced by phenobarbital. The enzyme is known to metabolize many xenobiotics, including the anticonvulsive drug mephenytoin, benzo(a)pyrene, 7-ethyoxycoumarin, and the anti-cancer drug taxol. This gene is located within a cluster of cytochrome P450 genes on chromosome 10q24. Several transcript variants encoding a few different isoforms have been found for this gene. cytochrome P450 family 2 subfamily C member 8 CYP2C8 ENSG00000138115 NA
ENSG00000266844 NA NA RP11-862L9.3 ENSG00000266844 NA
23406 This gene encodes one of the numerous actin-binding proteins which regulate the actin cytoskeleton. This protein binds F-actin, and also interacts with 5-lipoxygenase, which is the first committed enzyme in leukotriene biosynthesis. Although this gene has been reported to map to chromosome 17 in the Smith-Magenis syndrome region, the best alignments for this gene are to chromosome 16. The Smith-Magenis syndrome region is the site of two related pseudogenes. coactosin like F-actin binding protein 1 COTL1 ENSG00000103187 NA
7070 This gene encodes a cell surface glycoprotein and member of the immunoglobulin superfamily of proteins. The encoded protein is involved in cell adhesion and cell communication in numerous cell types, but particularly in cells of the immune and nervous systems. The encoded protein is widely used as a marker for hematopoietic stem cells. This gene may function as a tumor suppressor in nasopharyngeal carcinoma. Alternative splicing results in multiple transcript variants. Thy-1 cell surface antigen THY1 ENSG00000154096 NA
4974 NA oligodendrocyte myelin glycoprotein OMG ENSG00000126861 NA
388849 NA coiled-coil domain containing 188 CCDC188 ENSG00000234409 NA
1368 The protein encoded by this gene is a membrane-bound arginine/lysine carboxypeptidase. Its expression is associated with monocyte to macrophage differentiation. This encoded protein contains hydrophobic regions at the amino and carboxy termini and has 6 potential asparagine-linked glycosylation sites. The active site residues of carboxypeptidases A and B are conserved in this protein. Three alternatively spliced transcript variants encoding the same protein have been described for this gene. carboxypeptidase M CPM ENSG00000135678 NA
55890 The protein encoded by this gene is a member of the type 3 G protein-coupled receptor family. Members of this superfamily are characterized by a signature 7-transmembrane domain motif. The specific function of this protein is unknown; however, this protein may mediate the cellular effects of retinoic acid on the G protein signal transduction cascade. Two transcript variants encoding different isoforms have been found for this gene. G protein-coupled receptor class C group 5 member C GPRC5C ENSG00000170412 NA
22824 The protein encoded by this gene is heat shock inducible and may act as a chaperone. The encoded protein can protect the heat-shocked cell against the harmful effects of aggregated proteins. This gene is highly expressed in leukemia cells and may be a good target for therapeutic intervention. Several transcripts encoding different isoforms have been found for this gene. heat shock protein family A (Hsp70) member 4 like HSPA4L ENSG00000164070 NA
23554 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. tetraspanin 12 TSPAN12 ENSG00000106025 NA
84952 This gene encodes a member of the cingulin family. The encoded protein localizes to both adherens and tight cell-cell junctions and mediates junction assembly and maintenance by regulating the activity of the small GTPases RhoA and Rac1. Heterozygous chromosomal rearrangements resulting in association of the promoter for this gene with the aromatase gene are a cause of aromatase excess syndrome. Alternatively spliced transcript variants have been observed for this gene. cingulin-like 1 CGNL1 ENSG00000128849 NA
ENSG00000263873 NA NA RP11-334E6.12 ENSG00000263873 NA
112817 The authors of PMID:20797690 cloned this gene while searching for genes in a region of chromosome 10 linked to primary hyperoxalurea type III. They noted that even though the encoded protein has been described as a mitochondrial dihydrodipicolinate synthase-like enzyme, it shares little homology with E. coli dihydrodipicolinate synthase (Dhdps), particularly in the putative substrate-binding region. Moreover, neither lysine biosynthesis nor sialic acid metabolism, for which Dhdps is responsible, occurs in vertebrate mitochondria. They propose that this gene encodes mitochondrial 4-hydroxyl-2-oxoglutarate aldolase (EC 4.1.3.16), which catalyzes the final step in the metabolic pathway of hydroxyproline, releasing glyoxylate and pyruvate. This gene is predominantly expressed in the liver and kidney, and mutations in this gene are found in patients with primary hyperoxalurea type III. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. 4-hydroxy-2-oxoglutarate aldolase 1 HOGA1 ENSG00000241935 NA
29984 Ras homolog, or Rho, proteins interact with protein kinases and may serve as targets for activated GTPase. They play a critical role in muscle differentiation. The protein encoded by this gene binds GTP and is a member of the small GTPase superfamily. It is involved in endosome dynamics and reorganization of the actin cytoskeleton, and it may coordinate membrane transport with the function of the cytoskeleton. Two transcript variants encoding different isoforms have been found for this gene. ras homolog family member D RHOD ENSG00000173156 NA
NA NA NA NA ENSG00000255824 TRUE
336 This gene encodes apolipoprotein (apo-) A-II, which is the second most abundant protein of the high density lipoprotein particles. The protein is found in plasma as a monomer, homodimer, or heterodimer with apolipoprotein D. Defects in this gene may result in apolipoprotein A-II deficiency or hypercholesterolemia. apolipoprotein A2 APOA2 ENSG00000158874 NA
115908 This locus encodes a protein that may play a role in the cellular response to arterial injury through involvement in vascular remodeling. Mutations at this locus have been associated with Barrett esophagus and esophageal adenocarcinoma. Alternatively spliced transcript variants have been described. collagen triple helix repeat containing 1 CTHRC1 ENSG00000164932 NA
9536 The protein encoded by this gene is a glutathione-dependent prostaglandin E synthase. The expression of this gene has been shown to be induced by proinflammatory cytokine interleukin 1 beta (IL1B). Its expression can also be induced by tumor suppressor protein TP53, and may be involved in TP53 induced apoptosis. Knockout studies in mice suggest that this gene may contribute to the pathogenesis of collagen-induced arthritis and mediate acute pain during inflammatory responses. prostaglandin E synthase PTGES ENSG00000148344 NA
3557 The protein encoded by this gene is a member of the interleukin 1 cytokine family. This protein inhibits the activities of interleukin 1, alpha (IL1A) and interleukin 1, beta (IL1B), and modulates a variety of interleukin 1 related immune and inflammatory responses. This gene and five other closely related cytokine genes form a gene cluster spanning approximately 400 kb on chromosome 2. A polymorphism of this gene is reported to be associated with increased risk of osteoporotic fractures and gastric cancer. Several alternatively spliced transcript variants encoding distinct isoforms have been reported. interleukin 1 receptor antagonist IL1RN ENSG00000136689 NA
100507392 NA smooth muscle and endothelial cell enriched migration/differentiation-associated long non-coding RNA SENCR ENSG00000254703 NA
26251 Voltage-gated potassium (Kv) channels represent the most complex class of voltage-gated ion channels from both functional and structural standpoints. Their diverse functions include regulating neurotransmitter release, heart rate, insulin secretion, neuronal excitability, epithelial electrolyte transport, smooth muscle contraction, and cell volume. This gene encodes a member of the potassium channel, voltage-gated, subfamily G. This member is a gamma subunit of the voltage-gated potassium channel. The delayed-rectifier type channels containing this subunit may contribute to cardiac action potential repolarization. potassium voltage-gated channel modifier subfamily G member 2 KCNG2 ENSG00000178342 NA
1401 The protein encoded by this gene belongs to the pentaxin family. It is involved in several host defense related functions based on its ability to recognize foreign pathogens and damaged cells of the host and to initiate their elimination by interacting with humoral and cellular effector systems in the blood. Consequently, the level of this protein in plasma increases greatly during acute phase response to tissue injury, infection, or other inflammatory stimuli. C-reactive protein, pentraxin-related CRP ENSG00000132693 NA
253982 NA aspartate beta-hydroxylase domain containing 1 ASPHD1 ENSG00000174939 NA
2053 This gene encodes a member of the epoxide hydrolase family. The protein, found in both the cytosol and peroxisomes, binds to specific epoxides and converts them to the corresponding dihydrodiols. Mutations in this gene have been associated with familial hypercholesterolemia. Alternatively spliced transcript variants have been described. epoxide hydrolase 2 EPHX2 ENSG00000120915 NA
ENSG00000271833 NA NA RP11-356B19.11 ENSG00000271833 NA
10788 This gene encodes a member of the IQGAP family. The protein contains three IQ domains, one calponin homology domain, one Ras-GAP domain and one WW domain. It interacts with components of the cytoskeleton, with cell adhesion molecules, and with several signaling molecules to regulate cell morphology and motility. IQ motif containing GTPase activating protein 2 IQGAP2 ENSG00000145703 NA
84842 NA 4-hydroxyphenylpyruvate dioxygenase like HPDL ENSG00000186603 NA
1636 This gene encodes an enzyme involved in catalyzing the conversion of angiotensin I into a physiologically active peptide angiotensin II. Angiotensin II is a potent vasopressor and aldosterone-stimulating peptide that controls blood pressure and fluid-electrolyte balance. This enzyme plays a key role in the renin-angiotensin system. Many studies have associated the presence or absence of a 287 bp Alu repeat element in this gene with the levels of circulating enzyme or cardiovascular pathophysiologies. Multiple alternatively spliced transcript variants encoding different isoforms have been identified, and two most abundant spliced variants encode the somatic form and the testicular form, respectively, that are equally active. angiotensin I converting enzyme ACE ENSG00000159640 NA
347 This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. apolipoprotein D APOD ENSG00000189058 NA
27165 The protein encoded by this gene is a mitochondrial phosphate-activated glutaminase that catalyzes the hydrolysis of glutamine to stoichiometric amounts of glutamate and ammonia. Originally thought to be liver-specific, this protein has been found in other tissues as well. Alternative splicing results in multiple transcript variants that encode different isoforms. glutaminase 2 GLS2 ENSG00000135423 NA
3856 This gene is a member of the type II keratin family clustered on the long arm of chromosome 12. Type I and type II keratins heteropolymerize to form intermediate-sized filaments in the cytoplasm of epithelial cells. The product of this gene typically dimerizes with keratin 18 to form an intermediate filament in simple single-layered epithelial cells. This protein plays a role in maintaining cellular structural integrity and also functions in signal transduction and cellular differentiation. Mutations in this gene cause cryptogenic cirrhosis. Alternatively spliced transcript variants have been found for this gene. keratin 8 KRT8 ENSG00000170421 NA
5569 The protein encoded by this gene is a member of the cAMP-dependent protein kinase (PKA) inhibitor family. This protein was demonstrated to interact with and inhibit the activities of both C alpha and C beta catalytic subunits of the PKA. Alternatively spliced transcript variants encoding the same protein have been reported. protein kinase (cAMP-dependent, catalytic) inhibitor alpha PKIA ENSG00000171033 NA
55711 This gene belongs to the short chain dehydrogenase/reductase superfamily. It encodes a reductase enzyme involved in the first step of wax biosynthesis wherein fatty acids are converted to fatty alcohols. The encoded peroxisomal protein utilizes saturated fatty acids of 16 or 18 carbons as preferred substrates. Alternatively spliced transcript variants have been observed for this gene. Related pseudogenes have been identified on chromosomes 2, 14 and 22. fatty acyl-CoA reductase 2 FAR2 ENSG00000064763 NA
11185 N-methylation of endogenous and xenobiotic compounds is a major method by which they are degraded. This gene encodes an enzyme that N-methylates indoles such as tryptamine. Alternative splicing results in multiple transcript variants. Read-through transcription also exists between this gene and the downstream FAM188B (family with sequence similarity 188, member B) gene. indolethylamine N-methyltransferase INMT ENSG00000241644 NA
165215 NA family with sequence similarity 171 member B FAM171B ENSG00000144369 NA
1123 This gene encodes GTPase-activating protein for ras-related p21-rac and a phorbol ester receptor. It is predominantly expressed in neurons, and plays an important role in neuronal signal-transduction mechanisms. Mutations in this gene are associated with Duane’s retraction syndrome 2 (DURS2). Alternatively spliced transcript variants encoding different isoforms have been described for this gene. chimerin 1 CHN1 ENSG00000128656 NA
10435 CDC42, a small Rho GTPase, regulates the formation of F-actin-containing structures through its interaction with the downstream effector proteins. The protein encoded by this gene is a member of the Borg family of CDC42 effector proteins. Borg family proteins contain a CRIB (Cdc42/Rac interactive-binding) domain. They bind to, and negatively regulate the function of CDC42. Coexpression of this protein with CDC42 suggested a role of this protein in actin filament assembly and cell shape control. CDC42 effector protein 2 CDC42EP2 ENSG00000149798 NA
150946 NA GRB2 associated regulator of MAPK1 subtype 2 GAREM2 ENSG00000157833 NA
4644 This gene is one of three myosin V heavy-chain genes, belonging to the myosin gene superfamily. Myosin V is a class of actin-based motor proteins involved in cytoplasmic vesicle transport and anchorage, spindle-pole alignment and mRNA translocation. The protein encoded by this gene is abundant in melanocytes and nerve cells. Mutations in this gene cause Griscelli syndrome type-1 (GS1), Griscelli syndrome type-3 (GS3) and neuroectodermal melanolysosomal disease, or Elejalde disease. Multiple alternatively spliced transcript variants encoding different isoforms have been reported, but the full-length nature of some variants has not been determined. myosin VA MYO5A ENSG00000197535 NA
79772 NA multiple C2 and transmembrane domain containing 1 MCTP1 ENSG00000175471 NA
25894 The protein encoded by this gene can function as a guanine nucleotide exchange factor (GEF) and may play a role in intracellular signaling and cytoskeleton dynamics at the Golgi apparatus. Polymorphisms in the region of this gene have been found to be associated with spinocerebellar ataxia in some study populations. Alternative splicing results in multiple transcript variants. pleckstrin homology and RhoGEF domain containing G4 PLEKHG4 ENSG00000196155 NA
101928158 NA LAMA5 antisense RNA 1 LAMA5-AS1 ENSG00000228812 NA
2534 This gene is a member of the protein-tyrosine kinase oncogene family. It encodes a membrane-associated tyrosine kinase that has been implicated in the control of cell growth. The protein associates with the p85 subunit of phosphatidylinositol 3-kinase and interacts with the fyn-binding protein. Alternatively spliced transcript variants encoding distinct isoforms exist. FYN proto-oncogene, Src family tyrosine kinase FYN ENSG00000010810 NA
22821 This gene encodes a protein that binds inositol 1,3,4,5-tetrakisphosphate and stimulates the GTPase activity of Ras p21. This protein functions as a negative regulator of the Ras signalling pathway. It is localized to the cell membrane via a pleckstrin homology (PH) domain in the C-terminal region. Alternative splicing results in multiple transcript variants. RAS p21 protein activator 3 RASA3 ENSG00000185989 NA
109 This gene encodes adenylyl cyclase 3 which is a membrane-associated enzyme and catalyzes the formation of the secondary messenger cyclic adenosine monophosphate (cAMP). This protein appears to be widely expressed in various human tissues and may be involved in a number of physiological and pathophysiological metabolic processes. Two transcript variants encoding different isoforms have been found for this gene. adenylate cyclase 3 ADCY3 ENSG00000138031 NA
3984 There are approximately 40 known eukaryotic LIM proteins, so named for the LIM domains they contain. LIM domains are highly conserved cysteine-rich structures containing 2 zinc fingers. Although zinc fingers usually function by binding to DNA or RNA, the LIM motif probably mediates protein-protein interactions. LIM kinase-1 and LIM kinase-2 belong to a small subfamily with a unique combination of 2 N-terminal LIM motifs and a C-terminal protein kinase domain. LIMK1 is a serine/threonine kinase that regulates actin polymerization via phosphorylation and inactivation of the actin binding factor cofilin. This protein is ubiquitously expressed during development and plays a role in many cellular processes associated with cytoskeletal structure. This protein also stimulates axon growth and may play a role in brain development. LIMK1 hemizygosity is implicated in the impaired visuospatial constructive cognition of Williams syndrome. Alternative splicing results in multiple transcript variants encoding distinct isoforms. LIM domain kinase 1 LIMK1 ENSG00000106683 NA
9057 NA solute carrier family 7 member 6 SLC7A6 ENSG00000103064 NA
83875 This gene encodes an enzyme which oxidizes carotenoids such as beta-carotene during the biosynthesis of vitamin A. Multiple transcript variants encoding different isoforms have been found for this gene. beta-carotene oxygenase 2 BCO2 ENSG00000197580 NA
5320 The protein encoded by this gene is a member of the phospholipase A2 family (PLA2). PLA2s constitute a diverse family of enzymes with respect to sequence, function, localization, and divalent cation requirements. This gene product belongs to group II, which contains secreted form of PLA2, an extracellular enzyme that has a low molecular mass and requires calcium ions for catalysis. It catalyzes the hydrolysis of the sn-2 fatty acid acyl ester bond of phosphoglycerides, releasing free fatty acids and lysophospholipids, and thought to participate in the regulation of the phospholipid metabolism in biomembranes. Several alternatively spliced transcript variants with different 5’ UTRs have been found for this gene. phospholipase A2 group IIA PLA2G2A ENSG00000188257 NA
84627 This gene encodes a zinc-finger protein. Low-percent homology to certain collagens suggests that it may function as a transcription factor or extra-nuclear regulator factor for the synthesis or organization of collagen fibers. Mutations in this gene cause brittle cornea syndrome. zinc finger protein 469 ZNF469 ENSG00000225614 NA
114804 NA ring finger protein 157 RNF157 ENSG00000141576 NA
NA NA NA NA ENSG00000272016 TRUE
85464 This gene encodes a protein tyrosine phosphatase that plays a key role in the regulation of actin filaments. The encoded protein dephosphorylates and activates cofilin, which promotes actin filament depolymerization. Alternative splicing results in multiple transcript variants. slingshot protein phosphatase 2 SSH2 ENSG00000141298 NA
9196 This gene encodes a member of the potassium channel, voltage-gated, shaker-related subfamily. The encoded protein is one of the beta subunits, which are auxiliary proteins associating with functional Kv-alpha subunits. The encoded protein forms a heterodimer with the potassium voltage-gated channel, shaker-related subfamily, member 5 gene product and regulates the activity of the alpha subunit. potassium voltage-gated channel subfamily A regulatory beta subunit 3 KCNAB3 ENSG00000170049 NA
283375 The protein encoded by this gene belongs to the ZIP family of zinc transporters that transport zinc into cells from outside, and play a crucial role in controlling intracellular zinc levels. Zinc is an essential cofactor for many enzymes and proteins involved in gene transcription, growth, development and differentiation. Mutations in this gene have been associated with autosomal dominant high myopia (MYP24). Alternatively spliced transcript variants have been found for this gene. solute carrier family 39 member 5 SLC39A5 ENSG00000139540 NA
84909 This gene encodes a member of the M1 zinc aminopeptidase family. The encoded protein is a zinc-dependent metallopeptidase that catalyzes the removal of an amino acid from the amino terminus of a protein or peptide. This protein may play a role in the generation of angiotensin IV. Alternate splicing results in multiple transcript variants. chromosome 9 open reading frame 3 C9orf3 ENSG00000148120 NA
2628 This gene encodes a mitochondrial enzyme that belongs to the amidinotransferase family. This enzyme is involved in creatine biosynthesis, whereby it catalyzes the transfer of a guanido group from L-arginine to glycine, resulting in guanidinoacetic acid, the immediate precursor of creatine. Mutations in this gene cause arginine:glycine amidinotransferase deficiency, an inborn error of creatine synthesis characterized by mental retardation, language impairment, and behavioral disorders. glycine amidinotransferase GATM ENSG00000171766 NA
10659 Members of the CELF/BRUNOL protein family contain two N-terminal RNA recognition motif (RRM) domains, one C-terminal RRM domain, and a divergent segment of 160-230 aa between the second and third RRM domains. Members of this protein family regulate pre-mRNA alternative splicing and may also be involved in mRNA editing, and translation. Alternative splicing results in multiple transcript variants encoding different isoforms. CUGBP, Elav-like family member 2 CELF2 ENSG00000048740 NA
10630 This gene encodes a type-I integral membrane glycoprotein with diverse distribution in human tissues. The physiological function of this protein may be related to its mucin-type character. The homologous protein in other species has been described as a differentiation antigen and influenza-virus receptor. The specific function of this protein has not been determined but it has been proposed as a marker of lung injury. Alternatively spliced transcript variants encoding different isoforms have been identified. podoplanin PDPN ENSG00000162493 NA
81932 NA haloacid dehalogenase like hydrolase domain containing 3 HDHD3 ENSG00000119431 NA
5010 This gene encodes a member of the claudin family. Claudins are integral membrane proteins and components of tight junction strands. Tight junction strands serve as a physical barrier to prevent solutes and water from passing freely through the paracellular space between epithelial or endothelial cell sheets, and also play critical roles in maintaining cell polarity and signal transductions. The protein encoded by this gene is a major component of central nervous system (CNS) myelin and plays an important role in regulating proliferation and migration of oligodendrocytes. Mouse studies showed that the gene deficiency results in deafness and loss of the Sertoli cell epithelial phenotype in the testis. This protein is a tight junction protein at the human blood-testis barrier (BTB), and the BTB disruption is related to a dysfunction of this gene. Alternatively spliced transcript variants encoding different isoforms have been identified. claudin 11 CLDN11 ENSG00000013297 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",2,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 3 Annotations

out <- mygene::queryMany(gene_list[3,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name X_id symbol query summary notfound
UBE2R2 antisense RNA 1 ENSG00000235481 UBE2R2-AS1 ENSG00000235481 NA NA
myelin regulatory factor 745 MYRF ENSG00000124920 This gene encodes a transcription factor that is required for central nervous system myelination and may regulate oligodendrocyte differentiation. It is thought to act by increasing the expression of genes that effect myelin production but may also directly promote myelin gene expression. Loss of a similar gene in mouse models results in severe demyelination. Alternative splicing results in multiple transcript variants. NA
protease, serine 3 5646 PRSS3 ENSG00000010438 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is expressed in the brain and pancreas and is resistant to common trypsin inhibitors. It is active on peptide linkages involving the carboxyl group of lysine or arginine. This gene is localized to the locus of T cell receptor beta variable orphans on chromosome 9. Four transcript variants encoding different isoforms have been described for this gene. NA
glial fibrillary acidic protein 2670 GFAP ENSG00000131095 This gene encodes one of the major intermediate filament proteins of mature astrocytes. It is used as a marker to distinguish astrocytes from other glial cells during development. Mutations in this gene cause Alexander disease, a rare disorder of astrocytes in the central nervous system. Alternative splicing results in multiple transcript variants encoding distinct isoforms. NA
gastric inhibitory polypeptide receptor 2696 GIPR ENSG00000010310 This gene encodes a G-protein coupled receptor for gastric inhibitory polypeptide (GIP), which was originally identified as an activity in gut extracts that inhibited gastric acid secretion and gastrin release, but subsequently was demonstrated to stimulate insulin release in the presence of elevated glucose. Mice lacking this gene exhibit higher blood glucose levels with impaired initial insulin response after oral glucose load. Defect in this gene thus may contribute to the pathogenesis of diabetes. NA
aldolase, fructose-bisphosphate B 229 ALDOB ENSG00000136872 Fructose-1,6-bisphosphate aldolase (EC 4.1.2.13) is a tetrameric glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Vertebrates have 3 aldolase isozymes which are distinguished by their electrophoretic and catalytic properties. Differences indicate that aldolases A, B, and C are distinct proteins, the products of a family of related ‘housekeeping’ genes exhibiting developmentally regulated expression of the different isozymes. The developing embryo produces aldolase A, which is produced in even greater amounts in adult muscle where it can be as much as 5% of total cellular protein. In adult liver, kidney and intestine, aldolase A expression is repressed and aldolase B is produced. In brain and other nervous tissue, aldolase A and C are expressed about equally. There is a high degree of homology between aldolase A and C. Defects in ALDOB cause hereditary fructose intolerance. NA
chemerin chemokine-like receptor 1 1240 CMKLR1 ENSG00000174600 NA NA
hepcidin antimicrobial peptide 57817 HAMP ENSG00000105697 The product encoded by this gene is involved in the maintenance of iron homeostasis, and it is necessary for the regulation of iron storage in macrophages, and for intestinal iron absorption. The preproprotein is post-translationally cleaved into mature peptides of 20, 22 and 25 amino acids, and these active peptides are rich in cysteines, which form intramolecular bonds that stabilize their beta-sheet structures. These peptides exhibit antimicrobial activity against bacteria and fungi. Mutations in this gene cause hemochromatosis type 2B, also known as juvenile hemochromatosis, a disease caused by severe iron overload that results in cardiomyopathy, cirrhosis, and endocrine failure. NA
fibrillarin-like 1 ENSG00000188573 FBLL1 ENSG00000188573 NA NA
calmegin 1047 CLGN ENSG00000153132 Calmegin is a testis-specific endoplasmic reticulum chaperone protein. CLGN may play a role in spermatogeneisis and infertility. NA
aspartate beta-hydroxylase domain containing 1 253982 ASPHD1 ENSG00000174939 NA NA
solute carrier family 16 member 9 220963 SLC16A9 ENSG00000165449 NA NA
integrin subunit alpha 8 8516 ITGA8 ENSG00000077943 Integrins are heterodimeric transmembrane receptor proteins that mediate numerous cellular processes including cell adhesion, cytoskeletal rearrangement, and activation of cell signaling pathways. Integrins are composed of alpha and beta subunits. This gene encodes the alpha 8 subunit of the heterodimeric integrin alpha8beta1 protein. The encoded protein is a single-pass type 1 membrane protein that contains multiple FG-GAP repeats. This repeat is predicted to fold into a beta propeller structure. This gene regulates the recruitment of mesenchymal cells into epithelial structures, mediates cell-cell interactions, and regulates neurite outgrowth of sensory and motor neurons. The integrin alpha8beta1 protein thus plays an important role in wound-healing and organogenesis. Mutations in this gene have been associated with renal hypodysplasia/aplasia-1 (RHDA1) and with several animal models of chronic kidney disease. Alternate splicing results in multiple transcript variants encoding distinct isoforms. NA
synuclein beta 6620 SNCB ENSG00000074317 This gene encodes a member of a small family of proteins that inhibit phospholipase D2 and may function in neuronal plasticity. The encoded protein is abundant in lesions of patients with Alzheimer disease. A mutation in this gene was found in individuals with dementia with Lewy bodies. Alternative splicing results in multiple transcript variants. NA
amine oxidase, copper containing 3 8639 AOC3 ENSG00000131471 This gene encodes a member of the semicarbazide-sensitive amine oxidase family. Copper amine oxidases catalyze the oxidative conversion of amines to aldehydes in the presence of copper and quinone cofactor. The encoded protein is localized to the cell surface, has adhesive properties as well as monoamine oxidase activity, and may be involved in leukocyte trafficking. Alterations in levels of the encoded protein may be associated with many diseases, including diabetes mellitus. A pseudogene of this gene has been described and is located approximately 9-kb downstream on the same chromosome. Alternative splicing results in multiple transcript variants. NA
protein phosphatase 1 regulatory inhibitor subunit 1B 84152 PPP1R1B ENSG00000131771 This gene encodes a bifunctional signal transduction molecule. Dopaminergic and glutamatergic receptor stimulation regulates its phosphorylation and function as a kinase or phosphatase inhibitor. As a target for dopamine, this gene may serve as a therapeutic target for neurologic and psychiatric disorders. Multiple transcript variants encoding different isoforms have been found for this gene. NA
myocyte enhancer factor 2C 4208 MEF2C ENSG00000081189 This locus encodes a member of the MADS box transcription enhancer factor 2 (MEF2) family of proteins, which play a role in myogenesis. The encoded protein, MEF2 polypeptide C, has both trans-activating and DNA binding activities. This protein may play a role in maintaining the differentiated state of muscle cells. Mutations and deletions at this locus have been associated with severe mental retardation, stereotypic movements, epilepsy, and cerebral malformation. Alternatively spliced transcript variants have been described. NA
NA ENSG00000245864 CTC-467M3.1 ENSG00000245864 NA NA
RUN domain containing 3A 10900 RUNDC3A ENSG00000108309 NA NA
poly(A) binding protein interacting protein 2B 400961 PAIP2B ENSG00000124374 Most mRNAs, except for histones, contain a 3-prime poly(A) tail. Poly(A)-binding protein (PABP; see MIM 604679) enhances translation by circularizing mRNA through its interaction with the translation initiation factor EIF4G1 (MIM 600495) and the poly(A) tail. Various PABP-binding proteins regulate PABP activity, including PAIP1 (MIM 605184), a translational stimulator, and PAIP2A (MIM 605604) and PAIP2B, translational inhibitors (Derry et al., 2006 [PubMed 17381337]). NA
NA ENSG00000251660 AC007036.5 ENSG00000251660 NA NA
glycine amidinotransferase 2628 GATM ENSG00000171766 This gene encodes a mitochondrial enzyme that belongs to the amidinotransferase family. This enzyme is involved in creatine biosynthesis, whereby it catalyzes the transfer of a guanido group from L-arginine to glycine, resulting in guanidinoacetic acid, the immediate precursor of creatine. Mutations in this gene cause arginine:glycine amidinotransferase deficiency, an inborn error of creatine synthesis characterized by mental retardation, language impairment, and behavioral disorders. NA
protein disulfide isomerase family A member 2 64714 PDIA2 ENSG00000185615 Protein disulfide isomerases (EC 5.3.4.1), such as PDIP, are endoplasmic reticulum (ER) resident proteins that catalyze protein folding and thiol-disulfide interchange reactions (Desilva et al., 1996 [PubMed 8561901]). NA
ITPR1 antisense RNA 1 (head to head) ENSG00000231249 ITPR1-AS1 ENSG00000231249 NA NA
spermine oxidase 54498 SMOX ENSG00000088826 Polyamines are ubiquitous polycationic alkylamines which include spermine, spermidine, putrescine, and agmatine. These molecules participate in a broad range of cellular functions which include cell cycle modulation, scavenging reactive oxygen species, and the control of gene expression. These molecules also play important roles in neurotransmission through their regulation of cell-surface receptor activity, involvement in intracellular signalling pathways, and their putative roles as neurotransmitters. This gene encodes an FAD-containing enzyme that catalyzes the oxidation of spermine to spermadine and secondarily produces hydrogen peroxide. Multiple transcript variants encoding different isoenzymes have been identified for this gene, some of which have failed to demonstrate significant oxidase activity on natural polyamine substrates. The characterized isoenzymes have distinctive biochemical characteristics and substrate specificities, suggesting the existence of additional levels of complexity in polyamine catabolism. NA
keratin 8 3856 KRT8 ENSG00000170421 This gene is a member of the type II keratin family clustered on the long arm of chromosome 12. Type I and type II keratins heteropolymerize to form intermediate-sized filaments in the cytoplasm of epithelial cells. The product of this gene typically dimerizes with keratin 18 to form an intermediate filament in simple single-layered epithelial cells. This protein plays a role in maintaining cellular structural integrity and also functions in signal transduction and cellular differentiation. Mutations in this gene cause cryptogenic cirrhosis. Alternatively spliced transcript variants have been found for this gene. NA
NA ENSG00000255498 RP11-618K13.2 ENSG00000255498 NA NA
NA ENSG00000266844 RP11-862L9.3 ENSG00000266844 NA NA
integrin subunit alpha 3 3675 ITGA3 ENSG00000005884 The gene encodes a member of the integrin alpha chain family of proteins. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain that function as cell surface adhesion molecules. The encoded preproprotein is proteolytically processed to generate light and heavy chains that comprise the alpha 3 subunit. This subunit joins with a beta 1 subunit to form an integrin that interacts with extracellular matrix proteins including members of the laminin family. Expression of this gene may be correlated with breast cancer metastasis. NA
chromosome 2 open reading frame 82 389084 C2orf82 ENSG00000182600 NA NA
NA NA NA ENSG00000165862 NA TRUE
matrix Gla protein 4256 MGP ENSG00000111341 The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. NA
myozenin 1 58529 MYOZ1 ENSG00000177791 The protein encoded by this gene is primarily expressed in the skeletal muscle, and belongs to the myozenin family. Members of this family function as calcineurin-interacting proteins that help tether calcineurin to the sarcomere of cardiac and skeletal muscle. They play an important role in modulation of calcineurin signaling. NA
energy homeostasis associated 375704 ENHO ENSG00000168913 NA NA
NA ENSG00000269906 RP11-248J18.2 ENSG00000269906 NA NA
maturin, neural progenitor differentiation regulator homolog (Xenopus) 222166 MTURN ENSG00000180354 NA NA
solute carrier family 47 member 1 55244 SLC47A1 ENSG00000142494 This gene is located within the Smith-Magenis syndrome region on chromosome 17. It encodes a protein of unknown function. NA
keratin 7 3855 KRT7 ENSG00000135480 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the simple epithelia lining the cavities of the internal organs and in the gland ducts and blood vessels. The genes encoding the type II cytokeratins are clustered in a region of chromosome 12q12-q13. Alternative splicing may result in several transcript variants; however, not all variants have been fully described. NA
NA ENSG00000269514 RP11-370I10.12 ENSG00000269514 NA NA
mucin 7, secreted 4589 MUC7 ENSG00000171195 This gene encodes a small salivary mucin, which is thought to play a role in facilitating the clearance of bacteria in the oral cavity and to aid in mastication, speech, and swallowing. The central domain of this glycoprotein contains tandem repeats, each composed of 23 amino acids. This antimicrobial protein has antibacterial and antifungal activity. The most common allele contains 6 repeats, and some alleles may be associated with susceptibility to asthma. Alternatively spliced transcript variants with different 5’ UTR, but encoding the same protein, have been found for this gene. NA
solute carrier family 37 member 2 219855 SLC37A2 ENSG00000134955 NA NA
natriuretic peptide receptor 1 4881 NPR1 ENSG00000169418 Guanylyl cyclases, catalyzing the production of cGMP from GTP, are classified as soluble and membrane forms (Garbers and Lowe, 1994 [PubMed 7982997]). The membrane guanylyl cyclases, often termed guanylyl cyclases A through F, form a family of cell-surface receptors with a similar topographic structure: an extracellular ligand-binding domain, a single membrane-spanning domain, and an intracellular region that contains a protein kinase-like domain and a cyclase catalytic domain. GC-A and GC-B function as receptors for natriuretic peptides; they are also referred to as atrial natriuretic peptide receptor A (NPR1) and type B (NPR2; MIM 108961). Also see NPR3 (MIM 108962), which encodes a protein with only the ligand-binding transmembrane and 37-amino acid cytoplasmic domains. NPR1 is a membrane-bound guanylate cyclase that serves as the receptor for both atrial and brain natriuretic peptides (ANP (MIM 108780) and BNP (MIM 600295), respectively). NA
interleukin 15 3600 IL15 ENSG00000164136 The protein encoded by this gene is a cytokine that regulates T and natural killer cell activation and proliferation. This cytokine and interleukine 2 share many biological activities. They are found to bind common hematopoietin receptor subunits, and may compete for the same receptor, and thus negatively regulate each other’s activity. The number of CD8+ memory cells is shown to be controlled by a balance between this cytokine and IL2. This cytokine induces the activation of JAK kinases, as well as the phosphorylation and activation of transcription activators STAT3, STAT5, and STAT6. Studies of the mouse counterpart suggested that this cytokine may increase the expression of apoptosis inhibitor BCL2L1/BCL-x(L), possibly through the transcription activation activity of STAT6, and thus prevent apoptosis. Alternatively spliced transcript variants of this gene have been reported. NA
neuralized E3 ubiquitin protein ligase 1 9148 NEURL1 ENSG00000107954 NA NA
pellino E3 ubiquitin protein ligase family member 2 57161 PELI2 ENSG00000139946 NA NA
C-X-C motif chemokine ligand 1 2919 CXCL1 ENSG00000163739 This antimicrobial gene encodes a member of the CXC subfamily of chemokines. The encoded protein is a secreted growth factor that signals through the G-protein coupled receptor, CXC receptor 2. This protein plays a role in inflammation and as a chemoattractant for neutrophils. Aberrant expression of this protein is associated with the growth and progression of certain tumors. A naturally occurring processed form of this protein has increased chemotactic activity. Alternate splicing results in coding and non-coding variants of this gene. A pseudogene of this gene is found on chromosome 4. NA
arylsulfatase G 22901 ARSG ENSG00000141337 The protein encoded by this gene belongs to the sulfatase enzyme family. Sulfatases hydrolyze sulfate esters from sulfated steroids, carbohydrates, proteoglycans, and glycolipids. They are involved in hormone biosynthesis, modulation of cell signaling, and degradation of macromolecules. This protein displays arylsulfatase activity at acidic pH, as is typical of lysosomal sulfatases, and has been shown to localize in the lysosomes. Alternatively spliced transcript variants have been found for this gene. NA
pentraxin 3 5806 PTX3 ENSG00000163661 NA NA
thromboxane A2 receptor 6915 TBXA2R ENSG00000006638 This gene encodes a member of the G protein-coupled receptor family. The protein interacts with thromboxane A2 to induce platelet aggregation and regulate hemostasis. A mutation in this gene results in a bleeding disorder. Multiple transcript variants encoding different isoforms have been found for this gene. NA
collagen type IV alpha 4 chain 1286 COL4A4 ENSG00000081052 This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. This particular collagen IV subunit, however, is only found in a subset of basement membranes. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. Mutations in this gene are associated with type II autosomal recessive Alport syndrome (hereditary glomerulonephropathy) and with familial benign hematuria (thin basement membrane disease). Two transcripts, differing only in their transcription start sites, have been identified for this gene and, as is common for collagen genes, multiple polyadenylation sites are found in the 3’ UTR. NA
transmembrane protease, serine 5 80975 TMPRSS5 ENSG00000166682 This gene encodes a protein that belongs to the serine protease family. Serine proteases are known to be involved in many physiological and pathological processes. Alternative splicing results in multiple transcript variants. NA
transthyretin 7276 TTR ENSG00000118271 This gene encodes transthyretin, one of the three prealbumins including alpha-1-antitrypsin, transthyretin and orosomucoid. Transthyretin is a carrier protein; it transports thyroid hormones in the plasma and cerebrospinal fluid, and also transports retinol (vitamin A) in the plasma. The protein consists of a tetramer of identical subunits. More than 80 different mutations in this gene have been reported; most mutations are related to amyloid deposition, affecting predominantly peripheral nerve and/or the heart, and a small portion of the gene mutations is non-amyloidogenic. The diseases caused by mutations include amyloidotic polyneuropathy, euthyroid hyperthyroxinaemia, amyloidotic vitreous opacities, cardiomyopathy, oculoleptomeningeal amyloidosis, meningocerebrovascular amyloidosis, carpal tunnel syndrome, etc. NA
solute carrier family 7 member 5 8140 SLC7A5 ENSG00000103257 NA NA
plakophilin 2 5318 PKP2 ENSG00000057294 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This gene product may regulate the signaling activity of beta-catenin. Two alternately spliced transcripts encoding two protein isoforms have been identified. A processed pseudogene with high similarity to this locus has been mapped to chromosome 12p13. NA
leucine rich alpha-2-glycoprotein 1 116844 LRG1 ENSG00000171236 The leucine-rich repeat (LRR) family of proteins, including LRG1, have been shown to be involved in protein-protein interaction, signal transduction, and cell adhesion and development. LRG1 is expressed during granulocyte differentiation (O’Donnell et al., 2002 [PubMed 12223515]). NA
ADAM metallopeptidase with thrombospondin type 1 motif 7 11173 ADAMTS7 ENSG00000136378 The protein encoded by this gene is a member of the ADAMTS (a disintegrin and metalloproteinase with thrombospondin motifs) family. Members of this family share several distinct protein modules, including a propeptide region, a metalloproteinase domain, a disintegrin-like domain, and a thrombospondin type 1 (TS) motif. Individual members of this family differ in the number of C-terminal TS motifs, and some have unique C-terminal domains. The encoded preproprotein is proteolytically processed to generate the mature enzyme. This enzyme contains two C-terminal TS motifs and may regulate vascular smooth muscle cell (VSMC) migration. Mutations in this gene may be associated with susceptibility to coronary artery disease. NA
alpha-2-glycoprotein 1, zinc-binding 563 AZGP1 ENSG00000160862 NA NA
G protein-coupled receptor kinase 5 2869 GRK5 ENSG00000198873 This gene encodes a member of the guanine nucleotide-binding protein (G protein)-coupled receptor kinase subfamily of the Ser/Thr protein kinase family. The protein phosphorylates the activated forms of G protein-coupled receptors thus initiating their deactivation. It has also been shown to play a role in regulating the motility of polymorphonuclear leukocytes (PMNs). NA
myosin, heavy chain 10, non-muscle 4628 MYH10 ENSG00000133026 This gene encodes a member of the myosin superfamily. The protein represents a conventional non-muscle myosin; it should not be confused with the unconventional myosin-10 (MYO10). Myosins are actin-dependent motor proteins with diverse functions including regulation of cytokinesis, cell motility, and cell polarity. Mutations in this gene have been associated with May-Hegglin anomaly and developmental defects in brain and heart. Multiple transcript variants encoding different isoforms have been found for this gene. NA
chromogranin A 1113 CHGA ENSG00000100604 The protein encoded by this gene is a member of the chromogranin/secretogranin family of neuroendocrine secretory proteins. It is found in secretory vesicles of neurons and endocrine cells. This gene product is a precursor to three biologically active peptides; vasostatin, pancreastatin, and parastatin. These peptides act as autocrine or paracrine negative modulators of the neuroendocrine system. Two other peptides, catestatin and chromofungin, have antimicrobial activity and antifungal activity, respectively. Two transcript variants encoding different isoforms have been found for this gene. NA
NA ENSG00000224459 RP11-169K16.4 ENSG00000224459 NA NA
aldo-keto reductase family 7 member A3 22977 AKR7A3 ENSG00000162482 Aldo-keto reductases, such as AKR7A3, are involved in the detoxification of aldehydes and ketones. NA
cadherin 1 999 CDH1 ENSG00000039068 This gene encodes a classical cadherin of the cadherin superfamily. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature glycoprotein. This calcium-dependent cell-cell adhesion protein is comprised of five extracellular cadherin repeats, a transmembrane region and a highly conserved cytoplasmic tail. Mutations in this gene are correlated with gastric, breast, colorectal, thyroid and ovarian cancer. Loss of function of this gene is thought to contribute to cancer progression by increasing proliferation, invasion, and/or metastasis. The ectodomain of this protein mediates bacterial adhesion to mammalian cells and the cytoplasmic domain is required for internalization. This gene is present in a gene cluster with other members of the cadherin family on chromosome 16. NA
chitinase 3 like 1 1116 CHI3L1 ENSG00000133048 Chitinases catalyze the hydrolysis of chitin, which is an abundant glycopolymer found in insect exoskeletons and fungal cell walls. The glycoside hydrolase 18 family of chitinases includes eight human family members. This gene encodes a glycoprotein member of the glycosyl hydrolase 18 family. The protein lacks chitinase activity and is secreted by activated macrophages, chondrocytes, neutrophils and synovial cells. The protein is thought to play a role in the process of inflammation and tissue remodeling. NA
thyroglobulin 7038 TG ENSG00000042832 Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. NA
mitogen-activated protein kinase 8 interacting protein 1 9479 MAPK8IP1 ENSG00000121653 This gene encodes a regulator of the pancreatic beta-cell function. It is highly similar to JIP-1, a mouse protein known to be a regulator of c-Jun amino-terminal kinase (Mapk8). This protein has been shown to prevent MAPK8 mediated activation of transcription factors, and to decrease IL-1 beta and MAP kinase kinase 1 (MEKK1) induced apoptosis in pancreatic beta cells. This protein also functions as a DNA-binding transactivator of the glucose transporter GLUT2. RE1-silencing transcription factor (REST) is reported to repress the expression of this gene in insulin-secreting beta cells. This gene is found to be mutated in a type 2 diabetes family, and thus is thought to be a susceptibility gene for type 2 diabetes. NA
family with sequence similarity 134 member B 54463 FAM134B ENSG00000154153 The protein encoded by this gene is a cis-Golgi transmembrane protein that may be necessary for the long-term survival of nociceptive and autonomic ganglion neurons. Mutations in this gene are a cause of hereditary sensory and autonomic neuropathy type IIB (HSAN IIB), and this gene may also play a role in susceptibility to vascular dementia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
growth arrest specific 5 (non-protein coding) 60674 GAS5 ENSG00000234741 This gene produces a spliced long non-coding RNA and is a member of the 5’ terminal oligo-pyrimidine class of genes. It is a small nucleolar RNA host gene, containing multiple C/D box snoRNA genes in its introns. Part of the secondary RNA structure of the encoded transcript mimics glucocorticoid response element (GRE) which means it can bind to the DNA binding domain of the glucocorticoid receptor (nuclear receptor subfamily 3, group C, member 1). This action blocks the glucocorticoid receptor from being activated and thereby stops it from regulating the transcription of its target genes. This transcript is also thought to regulate the transcriptional activity of other receptors, such as androgen, progesterone and mineralocorticoid receptors, that can bind to its GRE mimic region. Multiple functions have been associated with this transcript, including cellular growth arrest and apoptosis. It has also been identified as a potential tumor suppressor, with its down-regulation associated with cancer in multiple different tissues. NA
angiotensinogen 183 AGT ENSG00000135744 The protein encoded by this gene, pre-angiotensinogen or angiotensinogen precursor, is expressed in the liver and is cleaved by the enzyme renin in response to lowered blood pressure. The resulting product, angiotensin I, is then cleaved by angiotensin converting enzyme (ACE) to generate the physiologically active enzyme angiotensin II. The protein is involved in maintaining blood pressure and in the pathogenesis of essential hypertension and preeclampsia. Mutations in this gene are associated with susceptibility to essential hypertension, and can cause renal tubular dysgenesis, a severe disorder of renal tubular development. Defects in this gene have also been associated with non-familial structural atrial fibrillation, and inflammatory bowel disease. NA
kinesin family member 1A 547 KIF1A ENSG00000130294 The protein encoded by this gene is a member of the kinesin family and functions as an anterograde motor protein that transports membranous organelles along axonal microtubules. Mutations at this locus have been associated with spastic paraplegia-30 and hereditary sensory neuropathy IIC. Alternatively spliced transcript variants encoding distinct isoforms have been described. NA
immunoglobulin heavy constant gamma 1 (G1m marker) ENSG00000211896 IGHG1 ENSG00000211896 NA NA
retinol dehydrogenase 10 (all-trans) 157506 RDH10 ENSG00000121039 This gene encodes a retinol dehydrogenase, which converts all-trans-retinol to all-trans-retinal, with preference for NADP as a cofactor. Studies in mice suggest that this protein is essential for synthesis of embryonic retinoic acid and is required for limb, craniofacial, and organ development. NA
ATP binding cassette subfamily A member 1 19 ABCA1 ENSG00000165029 The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intracellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the ABC1 subfamily. Members of the ABC1 subfamily comprise the only major ABC subfamily found exclusively in multicellular eukaryotes. With cholesterol as its substrate, this protein functions as a cholesteral efflux pump in the cellular lipid removal pathway. Mutations in this gene have been associated with Tangier’s disease and familial high-density lipoprotein deficiency. NA
pleckstrin and Sec7 domain containing 2 84249 PSD2 ENSG00000146005 NA NA
NA NA NA ENSG00000203306 NA TRUE
solute carrier family 18 member A2 6571 SLC18A2 ENSG00000165646 The vesicular monoamine transporter acts to accumulate cytosolic monoamines into synaptic vesicles, using the proton gradient maintained across the synaptic vesicular membrane. Its proper function is essential to the correct activity of the monoaminergic systems that have been implicated in several human neuropsychiatric disorders. The transporter is a site of action of important drugs, including reserpine and tetrabenazine (summary by Peter et al., 1993 [PubMed 7905859]). See also SLC18A1 (MIM 193002). NA
indolethylamine N-methyltransferase 11185 INMT ENSG00000241644 N-methylation of endogenous and xenobiotic compounds is a major method by which they are degraded. This gene encodes an enzyme that N-methylates indoles such as tryptamine. Alternative splicing results in multiple transcript variants. Read-through transcription also exists between this gene and the downstream FAM188B (family with sequence similarity 188, member B) gene. NA
insulin like 3 3640 INSL3 ENSG00000248099 This gene encodes a member of the insulin-like hormone superfamily. The encoded protein is mainly produced in gonadal tissues. Studies of the mouse counterpart suggest that this gene may be involved in the development of urogenital tract and female fertility. This protein may also act as a hormone to regulate growth and differentiation of gubernaculum, and thus mediating intra-abdominal testicular descent. Mutations in this gene may lead to cryptorchidism. Alternate splicing results in multiple transcript variants. NA
immunoglobulin lambda like polypeptide 5 100423062 IGLL5 ENSG00000254709 This gene encodes one of the immunoglobulin lambda-like polypeptides. It is located within the immunoglobulin lambda locus but it does not require somatic rearrangement for expression. The first exon of this gene is unrelated to immunoglobulin variable genes; the second and third exons are the immunoglobulin lambda joining 1 and the immunoglobulin lambda constant 1 gene segments. Alternative splicing results in multiple transcript variants. NA
chymotrypsin like 1506 CTRL ENSG00000141086 NA NA
essential meiotic structure-specific endonuclease subunit 2 197342 EME2 ENSG00000197774 EME2 forms a heterodimer with MUS81 (MIM 606591) that functions as an XPF (MIM 278760)-type flap/fork endonuclease in DNA repair (Ciccia et al., 2007 [PubMed 17289582]). NA
inositol 1,4,5-trisphosphate receptor type 1 3708 ITPR1 ENSG00000150995 This gene encodes an intracellular receptor for inositol 1,4,5-trisphosphate. Upon stimulation by inositol 1,4,5-trisphosphate, this receptor mediates calcium release from the endoplasmic reticulum. Mutations in this gene cause spinocerebellar ataxia type 15, a disease associated with an heterogeneous group of cerebellar disorders. Multiple transcript variants have been identified for this gene. NA
lipocalin 2 3934 LCN2 ENSG00000148346 This gene encodes a protein that belongs to the lipocalin family. Members of this family transport small hydrophobic molecules such as lipids, steroid hormones and retinoids. The protein encoded by this gene is a neutrophil gelatinase-associated lipocalin and plays a role in innate immunity by limiting bacterial growth as a result of sequestering iron-containing siderophores. The presence of this protein in blood and urine is an early biomarker of acute kidney injury. This protein is thought to be be involved in multiple cellular processes, including maintenance of skin homeostasis, and suppression of invasiveness and metastasis. Mice lacking this gene are more susceptible to bacterial infection than wild type mice. NA
asparaginase like 1 80150 ASRGL1 ENSG00000162174 NA NA
ANO1 antisense RNA 1 ENSG00000254902 ANO1-AS1 ENSG00000254902 NA NA
visinin like 1 7447 VSNL1 ENSG00000163032 This gene is a member of the visinin/recoverin subfamily of neuronal calcium sensor proteins. The encoded protein is strongly expressed in granule cells of the cerebellum where it associates with membranes in a calcium-dependent manner and modulates intracellular signaling pathways of the central nervous system by directly or indirectly regulating the activity of adenylyl cyclase. Alternatively spliced transcript variants have been observed, but their full-length nature has not been determined. NA
v-myc avian myelocytomatosis viral oncogene lung carcinoma derived homolog 4610 MYCL ENSG00000116990 NA NA
heat shock protein family B (small) member 7 27129 HSPB7 ENSG00000173641 NA NA
heparan sulfate-glucosamine 3-sulfotransferase 3B1 9953 HS3ST3B1 ENSG00000125430 The protein encoded by this gene is a type II integral membrane protein that belongs to the 3-O-sulfotransferases family. These proteins catalyze the addition of sulfate groups at the 3-OH position of glucosamine in heparan sulfate. The substrate specificity of individual members of the family is based on prior modification of the heparan sulfate chain, thus allowing different members of the family to generate binding sites for different proteins on the same heparan sulfate chain. Following treatment with a histone deacetylase inhibitor, expression of this gene is activated in a pancreatic cell line. The increased expression results in promotion of the epithelial-mesenchymal transition. In addition, the modification catalyzed by this protein allows herpes simplex virus membrane fusion and penetration. A very closely related homolog with an almost identical sulfotransferase domain maps less than 1 Mb away. Alternative splicing results in multiple transcript variants. NA
mitochondrial elongation factor 2 125170 MIEF2 ENSG00000177427 This gene encodes an outer mitochondrial membrane protein that functions in the regulation of mitochondrial morphology. It can directly recruit the fission mediator dynamin-related protein 1 (Drp1) to the mitochondrial surface. The gene is located within the Smith-Magenis syndrome region on chromosome 17. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
thyroid peroxidase 7173 TPO ENSG00000115705 This gene encodes a membrane-bound glycoprotein. The encoded protein acts as an enzyme and plays a central role in thyroid gland function. The protein functions in the iodination of tyrosine residues in thyroglobulin and phenoxy-ester formation between pairs of iodinated tyrosines to generate the thyroid hormones, thyroxine and triiodothyronine. Mutations in this gene are associated with several disorders of thyroid hormonogenesis, including congenital hypothyroidism, congenital goiter, and thyroid hormone organification defect IIA. Multiple transcript variants encoding distinct isoforms have been identified for this gene, but the full-length nature of some variants has not been determined. NA
synaptonemal complex central element protein 1 93426 SYCE1 ENSG00000171772 NA NA
CUB and zona pellucida like domains 1 50624 CUZD1 ENSG00000138161 NA NA
LIM domain binding 3 11155 LDB3 ENSG00000122367 This gene encodes a PDZ domain-containing protein. PDZ motifs are modular protein-protein interaction domains consisting of 80-120 amino acid residues. PDZ domain-containing proteins interact with each other in cytoskeletal assembly or with other proteins involved in targeting and clustering of membrane proteins. The protein encoded by this gene interacts with alpha-actinin-2 through its N-terminal PDZ domain and with protein kinase C via its C-terminal LIM domains. The LIM domain is a cysteine-rich motif defined by 50-60 amino acids containing two zinc-binding modules. This protein also interacts with all three members of the myozenin family. Mutations in this gene have been associated with myofibrillar myopathy and dilated cardiomyopathy. Alternatively spliced transcript variants encoding different isoforms have been identified; all isoforms have N-terminal PDZ domains while only longer isoforms (1, 2 and 5) have C-terminal LIM domains. NA
neurexophilin 3 11248 NXPH3 ENSG00000182575 NA NA
N-myc downstream regulated 1 10397 NDRG1 ENSG00000104419 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein involved in stress responses, hormone responses, cell growth, and differentiation. The encoded protein is necessary for p53-mediated caspase activation and apoptosis. Mutations in this gene are a cause of Charcot-Marie-Tooth disease type 4D, and expression of this gene may be a prognostic indicator for several types of cancer. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
geminin, DNA replication inhibitor 51053 GMNN ENSG00000112312 This gene encodes a protein that plays a critical role in cell cycle regulation. The encoded protein inhibits DNA replication by binding to DNA replication factor Cdt1, preventing the incorporation of minichromosome maintenance proteins into the pre-replication complex. The encoded protein is expressed during the S and G2 phases of the cell cycle and is degraded by the anaphase-promoting complex during the metaphase-anaphase transition. Increased expression of this gene may play a role in several malignancies including colon, rectal and breast cancer. Alternatively spliced transcript variants have been observed for this gene, and two pseudogenes of this gene are located on the short arm of chromosome 16. NA
cornichon family AMPA receptor auxiliary protein 2 254263 CNIH2 ENSG00000174871 The protein encoded by this gene is an auxiliary subunit of the ionotropic glutamate receptor of the AMPA subtype. AMPA receptors mediate fast synaptic neurotransmission in the central nervous system. This protein has been reported to interact with the Type I AMPA receptor regulatory protein isoform gamma-8 to control assembly of hippocampal AMPA receptor complexes, thereby modulating receptor gating and pharmacology. Alternative splicing results in multiple transcript variants. NA
eukaryotic translation elongation factor 1 beta 2 pseudogene 2 ENSG00000213864 EEF1B2P2 ENSG00000213864 NA NA
frizzled class receptor 1 8321 FZD1 ENSG00000157240 Members of the ‘frizzled’ gene family encode 7-transmembrane domain proteins that are receptors for Wnt signaling proteins. The FZD1 protein contains a signal peptide, a cysteine-rich domain in the N-terminal extracellular region, 7 transmembrane domains, and a C-terminal PDZ domain-binding motif. The FZD1 transcript is expressed in various tissues. NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",3,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 4 Annotations

out <- mygene::queryMany(gene_list[4,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
X_id symbol query name summary notfound
3512 JCHAIN ENSG00000132465 joining chain of multimeric IgA and IgM NA NA
ENSG00000211899 IGHM ENSG00000211899 immunoglobulin heavy constant mu NA NA
ENSG00000211677 IGLC2 ENSG00000211677 immunoglobulin lambda constant 2 (Kern-Oz- marker) NA NA
ENSG00000211679 IGLC3 ENSG00000211679 immunoglobulin lambda constant 3 (Kern-Oz+ marker) NA NA
100423062 IGLL5 ENSG00000254709 immunoglobulin lambda like polypeptide 5 This gene encodes one of the immunoglobulin lambda-like polypeptides. It is located within the immunoglobulin lambda locus but it does not require somatic rearrangement for expression. The first exon of this gene is unrelated to immunoglobulin variable genes; the second and third exons are the immunoglobulin lambda joining 1 and the immunoglobulin lambda constant 1 gene segments. Alternative splicing results in multiple transcript variants. NA
10900 RUNDC3A ENSG00000108309 RUN domain containing 3A NA NA
ENSG00000211675 IGLC1 ENSG00000211675 immunoglobulin lambda constant 1 (Mcg marker) NA NA
51316 PLAC8 ENSG00000145287 placenta specific 8 NA NA
100507387 LOC100507387 ENSG00000182230 uncharacterized LOC100507387 NA NA
202134 FAM153B ENSG00000182230 family with sequence similarity 153 member B NA NA
ENSG00000211895 IGHA1 ENSG00000211895 immunoglobulin heavy constant alpha 1 NA NA
ENSG00000211893 IGHG2 ENSG00000211893 immunoglobulin heavy constant gamma 2 (G2m marker) NA NA
401027 C2orf66 ENSG00000187944 chromosome 2 open reading frame 66 NA NA
ENSG00000253364 RP11-731F5.2 ENSG00000253364 NA NA NA
973 CD79A ENSG00000105369 CD79a molecule The B lymphocyte antigen receptor is a multimeric complex that includes the antigen-specific component, surface immunoglobulin (Ig). Surface Ig non-covalently associates with two other proteins, Ig-alpha and Ig-beta, which are necessary for expression and function of the B-cell antigen receptor. This gene encodes the Ig-alpha protein of the B-cell antigen component. Alternatively spliced transcript variants encoding different isoforms have been described. NA
11065 UBE2C ENSG00000175063 ubiquitin conjugating enzyme E2 C The modification of proteins with ubiquitin is an important cellular mechanism for targeting abnormal or short-lived proteins for degradation. Ubiquitination involves at least three classes of enzymes: ubiquitin-activating enzymes, ubiquitin-conjugating enzymes, and ubiquitin-protein ligases. This gene encodes a member of the E2 ubiquitin-conjugating enzyme family. The encoded protein is required for the destruction of mitotic cyclins and for cell cycle progression, and may be involved in cancer progression. Multiple transcript variants encoding different isoforms have been found for this gene. Pseudogenes of this gene have been defined on chromosomes 4, 14, 15, 18, and 19. NA
3801 KIFC3 ENSG00000140859 kinesin family member C3 This gene encodes a member of the kinesin-14 family of microtubule motors. Members of this family play a role in the formation, maintenance and remodeling of the bipolar mitotic spindle. The protein encoded by this gene has cytoplasmic functions in the interphase cells. It may also be involved in the final stages of cytokinesis. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
5909 RAP1GAP ENSG00000076864 RAP1 GTPase activating protein This gene encodes a type of GTPase-activating-protein (GAP) that down-regulates the activity of the ras-related RAP1 protein. RAP1 acts as a molecular switch by cycling between an inactive GDP-bound form and an active GTP-bound form. The product of this gene, RAP1GAP, promotes the hydrolysis of bound GTP and hence returns RAP1 to the inactive state whereas other proteins, guanine nucleotide exchange factors (GEFs), act as RAP1 activators by facilitating the conversion of RAP1 from the GDP- to the GTP-bound form. In general, ras subfamily proteins, such as RAP1, play key roles in receptor-linked signaling pathways that control cell growth and differentiation. RAP1 plays a role in diverse processes such as cell proliferation, adhesion, differentiation, and embryogenesis. Alternative splicing results in multiple transcript variants encoding distinct proteins. NA
23397 NCAPH ENSG00000121152 non-SMC condensin I complex subunit H This gene encodes a member of the barr gene family and a regulatory subunit of the condensin complex. This complex is required for the conversion of interphase chromatin into condensed chromosomes. The protein encoded by this gene is associated with mitotic chromosomes, except during the early phase of chromosome condensation. During interphase, the protein has a distinct punctate nucleolar localization. Alternatively spliced transcript variants encoding different proteins have been described. NA
ENSG00000211897 IGHG3 ENSG00000211897 immunoglobulin heavy constant gamma 3 (G3m marker) NA NA
ENSG00000211896 IGHG1 ENSG00000211896 immunoglobulin heavy constant gamma 1 (G1m marker) NA NA
4085 MAD2L1 ENSG00000164109 MAD2 mitotic arrest deficient-like 1 (yeast) MAD2L1 is a component of the mitotic spindle assembly checkpoint that prevents the onset of anaphase until all chromosomes are properly aligned at the metaphase plate. MAD2L1 is related to the MAD2L2 gene located on chromosome 1. A MAD2 pseudogene has been mapped to chromosome 14. NA
51203 NUSAP1 ENSG00000137804 nucleolar and spindle associated protein 1 NUSAP1 is a nucleolar-spindle-associated protein that plays a role in spindle microtubule organization (Raemaekers et al., 2003 [PubMed 12963707]). NA
122618 PLD4 ENSG00000166428 phospholipase D family member 4 NA NA
933 CD22 ENSG00000012124 CD22 molecule NA NA
NA NA ENSG00000256390 NA NA TRUE
9547 CXCL14 ENSG00000145824 C-X-C motif chemokine ligand 14 This antimicrobial gene belongs to the cytokine gene family which encode secreted proteins involved in immunoregulatory and inflammatory processes. The protein encoded by this gene is structurally related to the CXC (Cys-X-Cys) subfamily of cytokines. Members of this subfamily are characterized by two cysteines separated by a single amino acid. This cytokine displays chemotactic activity for monocytes but not for lymphocytes, dendritic cells, neutrophils or macrophages. It has been implicated that this cytokine is involved in the homeostasis of monocyte-derived macrophages rather than in inflammation. NA
5443 POMC ENSG00000115138 proopiomelanocortin This gene encodes a preproprotein that undergoes extensive, tissue-specific, post-translational processing via cleavage by subtilisin-like enzymes known as prohormone convertases. There are eight potential cleavage sites within the preproprotein and, depending on tissue type and the available convertases, processing may yield as many as ten biologically active peptides involved in diverse cellular functions. The encoded protein is synthesized mainly in corticotroph cells of the anterior pituitary where four cleavage sites are used; adrenocorticotrophin, essential for normal steroidogenesis and the maintenance of normal adrenal weight, and lipotropin beta are the major end products. In other tissues, including the hypothalamus, placenta, and epithelium, all cleavage sites may be used, giving rise to peptides with roles in pain and energy homeostasis, melanocyte stimulation, and immune modulation. These include several distinct melanotropins, lipotropins, and endorphins that are contained within the adrenocorticotrophin and beta-lipotropin peptides. The antimicrobial melanotropin alpha peptide exhibits antibacterial and antifungal activity. Mutations in this gene have been associated with early onset obesity, adrenal insufficiency, and red hair pigmentation. Alternatively spliced transcript variants encoding the same protein have been described. NA
55619 DOCK10 ENSG00000135905 dedicator of cytokinesis 10 This gene encodes a member of the dedicator of cytokinesis protein family. Members of this family are guanosine nucleotide exchange factors for Rho GTPases and defined by the presence of conserved DOCK-homology regions. The encoded protein belongs to the D (or Zizimin) subfamily of DOCK proteins, which also contain an N-terminal pleckstrin homology domain. Alternatively spliced transcript variants that encode different isoforms have been described. NA
22974 TPX2 ENSG00000088325 TPX2, microtubule nucleation factor NA NA
ENSG00000223353 RP11-290P14.2 ENSG00000223353 NA NA NA
83461 CDCA3 ENSG00000111665 cell division cycle associated 3 NA NA
118430 MUCL1 ENSG00000172551 mucin like 1 NA NA
1145 CHRNE ENSG00000108556 cholinergic receptor nicotinic epsilon subunit Acetylcholine receptors at mature mammalian neuromuscular junctions are pentameric protein complexes composed of four subunits in the ratio of two alpha subunits to one beta, one epsilon, and one delta subunit. The acetylcholine receptor changes subunit composition shortly after birth when the epsilon subunit replaces the gamma subunit seen in embryonic receptors. Mutations in the epsilon subunit are associated with congenital myasthenic syndrome. NA
5888 RAD51 ENSG00000051180 RAD51 recombinase The protein encoded by this gene is a member of the RAD51 protein family. RAD51 family members are highly similar to bacterial RecA and Saccharomyces cerevisiae Rad51, and are known to be involved in the homologous recombination and repair of DNA. This protein can interact with the ssDNA-binding protein RPA and RAD52, and it is thought to play roles in homologous pairing and strand transfer of DNA. This protein is also found to interact with BRCA1 and BRCA2, which may be important for the cellular response to DNA damage. BRCA2 is shown to regulate both the intracellular localization and DNA-binding ability of this protein. Loss of these controls following BRCA2 inactivation may be a key event leading to genomic instability and tumorigenesis. Multiple transcript variants encoding different isoforms have been found for this gene. NA
221424 LRRC73 ENSG00000204052 leucine rich repeat containing 73 NA NA
51676 ASB2 ENSG00000100628 ankyrin repeat and SOCS box containing 2 This gene encodes a member of the ankyrin repeat and SOCS box-containing (ASB) protein family. These proteins play a role in protein degradation by coupling suppressor of cytokine signalling (SOCS) proteins with the elongin BC complex. The encoded protein is a subunit of a multimeric E3 ubiquitin ligase complex that mediates the degradation of actin-binding proteins. This gene plays a role in retinoic acid-induced growth inhibition and differentiation of myeloid leukemia cells. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
144406 WDR66 ENSG00000158023 WD repeat domain 66 This protein encoded by this gene belongs to the WD repeat-containing family of proteins, which function in the formation of protein-protein complexes in a variety of biological pathways. This family member appears to function in the determination of mean platelet volume (MPV), and polymorphisms in this gene have been associated with variance in MPV. Alternative splicing of this gene results in multiple transcript variants. NA
113130 CDCA5 ENSG00000146670 cell division cycle associated 5 NA NA
79696 ZC2HC1C ENSG00000119703 zinc finger C2HC-type containing 1C NA NA
51659 GINS2 ENSG00000131153 GINS complex subunit 2 The yeast heterotetrameric GINS complex is made up of Sld5 (GINS4; MIM 610611), Psf1 (GINS1; MIM 610608), Psf2, and Psf3 (GINS3; MIM 610610). The formation of this complex is essential for the initiation of DNA replication in yeast and Xenopus egg extracts (Ueno et al., 2005 [PubMed 16287864]). See GINS1 for additional information about the GINS complex. NA
ENSG00000225062 CATIP-AS1 ENSG00000225062 CATIP antisense RNA 1 NA NA
ENSG00000204677 FAM153C ENSG00000204677 family with sequence similarity 153 member C NA NA
7038 TG ENSG00000042832 thyroglobulin Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. NA
5800 PTPRO ENSG00000151490 protein tyrosine phosphatase, receptor type O This gene encodes a member of the R3 subtype family of receptor-type protein tyrosine phosphatases. These proteins are localized to the apical surface of polarized cells and may have tissue-specific functions through activation of Src family kinases. This gene contains two distinct promoters, and alternatively spliced transcript variants encoding multiple isoforms have been observed. The encoded proteins may have multiple isoform-specific and tissue-specific functions, including the regulation of osteoclast production and activity, inhibition of cell proliferation and facilitation of apoptosis. This gene is a candidate tumor suppressor, and decreased expression of this gene has been observed in several types of cancer. NA
ENSG00000211890 IGHA2 ENSG00000211890 immunoglobulin heavy constant alpha 2 (A2m marker) NA NA
1081 CGA ENSG00000135346 glycoprotein hormones, alpha polypeptide The four human glycoprotein hormones chorionic gonadotropin (CG), luteinizing hormone (LH), follicle stimulating hormone (FSH), and thyroid stimulating hormone (TSH) are dimers consisting of alpha and beta subunits that are associated noncovalently. The alpha subunits of these hormones are identical, however, their beta chains are unique and confer biological specificity. The protein encoded by this gene is the alpha subunit and belongs to the glycoprotein hormones alpha chain family. Two transcript variants encoding different isoforms have been found for this gene. NA
6865 TACR2 ENSG00000075073 tachykinin receptor 2 This gene belongs to a family of genes that function as receptors for tachykinins. Receptor affinities are specified by variations in the 5’-end of the sequence. The receptors belonging to this family are characterized by interactions with G proteins and 7 hydrophobic transmembrane regions. This gene encodes the receptor for the tachykinin neuropeptide substance K, also referred to as neurokinin A. NA
11339 OIP5 ENSG00000104147 Opa interacting protein 5 The protein encoded by this gene localizes to centromeres, where it is essential for recruitment of CENP-A through the mediator Holliday junction recognition protein. Expression of this gene is upregulated in several cancers, making it a putative therapeutic target. Two transcript variants encoding different isoforms have been found for this gene. NA
5891 MOK ENSG00000080823 MOK protein kinase This gene belongs to the MAP kinase superfamily. The gene was found to be regulated by caudal type transcription factor 2 (Cdx2) protein. The encoded protein, which is localized to epithelial cells in the intestinal crypt, may play a role in growth arrest and differentiation of cells of upper crypt and lower villus regions. Multiple alternatively spliced transcript variants encoding different isoforms have been observed for this gene. NA
4157 MC1R ENSG00000258839 melanocortin 1 receptor This intronless gene encodes the receptor protein for melanocyte-stimulating hormone (MSH). The encoded protein, a seven pass transmembrane G protein coupled receptor, controls melanogenesis. Two types of melanin exist: red pheomelanin and black eumelanin. Gene mutations that lead to a loss in function are associated with increased pheomelanin production, which leads to lighter skin and hair color. Eumelanin is photoprotective but pheomelanin may contribute to UV-induced skin damage by generating free radicals upon UV radiation. Binding of MSH to its receptor activates the receptor and stimulates eumelanin synthesis. This receptor is a major determining factor in sun sensitivity and is a genetic risk factor for melanoma and non-melanoma skin cancer. Over 30 variant alleles have been identified which correlate with skin and hair color, providing evidence that this gene is an important component in determining normal human pigment variation. NA
9455 HOMER2 ENSG00000103942 homer scaffolding protein 2 This gene encodes a member of the homer family of dendritic proteins. Members of this family regulate group 1 metabotrophic glutamate receptor function. The encoded protein is a postsynaptic density scaffolding protein. Alternative splicing results in multiple transcript variants. Two related pseudogenes have been identified on chromosome 14. NA
283284 IGSF22 ENSG00000179057 immunoglobulin superfamily member 22 NA NA
83450 DRC3 ENSG00000171962 dynein regulatory complex subunit 3 NA NA
64105 CENPK ENSG00000123219 centromere protein K CENPK is a subunit of a CENPH (MIM 605607)-CENPI (MIM 300065)-associated centromeric complex that targets CENPA (MIM 117139) to centromeres and is required for proper kinetochore function and mitotic progression (Okada et al., 2006 [PubMed 16622420]). NA
84057 MND1 ENSG00000121211 meiotic nuclear divisions 1 The product of the MND1 gene associates with HOP2 (MIM 608665) to form a stable heterodimeric complex that binds DNA and stimulates the recombinase activity of RAD51 (MIM 179617) and DMC1 (MIM 602721) (Chi et al., 2007 [PubMed 17639080]). Both the MND1 and HOP2 genes are indispensable for meiotic recombination. NA
91147 TMEM67 ENSG00000164953 transmembrane protein 67 The protein encoded by this gene localizes to the primary cilium and to the plasma membrane. The gene functions in centriole migration to the apical membrane and formation of the primary cilium. Multiple transcript variants encoding different isoforms have been found for this gene. Defects in this gene are a cause of Meckel syndrome type 3 (MKS3) and Joubert syndrome type 6 (JBTS6). NA
2649 NR6A1 ENSG00000148200 nuclear receptor subfamily 6 group A member 1 This gene encodes an orphan nuclear receptor which is a member of the nuclear hormone receptor family. Its expression pattern suggests that it may be involved in neurogenesis and germ cell development. The protein can homodimerize and bind DNA, but in vivo targets have not been identified. Alternate splicing results in multiple transcript variants. NA
NA NA ENSG00000034063 NA NA TRUE
23762 OSBP2 ENSG00000184792 oxysterol binding protein 2 The protein encoded by this gene contains a pleckstrin homology (PH) domain and an oxysterol-binding region. It binds oxysterols such as 7-ketocholesterol and may inhibit their cytotoxicity. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
ENSG00000250654 RP11-834C11.7 ENSG00000250654 NA NA NA
NA NA ENSG00000234603 NA NA TRUE
55143 CDCA8 ENSG00000134690 cell division cycle associated 8 This gene encodes a component of the chromosomal passenger complex. This complex is an essential regulator of mitosis and cell division. This protein is cell-cycle regulated and is required for chromatin-induced microtubule stabilization and spindle formation. Alternate splicing results in multiple transcript variants. Pseudgenes of this gene are found on chromosomes 7, 8 and 16. NA
ENSG00000166770 ZNF667-AS1 ENSG00000166770 ZNF667 antisense RNA 1 (head to head) NA NA
1047 CLGN ENSG00000153132 calmegin Calmegin is a testis-specific endoplasmic reticulum chaperone protein. CLGN may play a role in spermatogeneisis and infertility. NA
118491 CFAP70 ENSG00000156042 cilia and flagella associated protein 70 NA NA
NA NA ENSG00000260655 NA NA TRUE
63934 ZNF667 ENSG00000198046 zinc finger protein 667 NA NA
7083 TK1 ENSG00000167900 thymidine kinase 1 NA NA
229 ALDOB ENSG00000136872 aldolase, fructose-bisphosphate B Fructose-1,6-bisphosphate aldolase (EC 4.1.2.13) is a tetrameric glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Vertebrates have 3 aldolase isozymes which are distinguished by their electrophoretic and catalytic properties. Differences indicate that aldolases A, B, and C are distinct proteins, the products of a family of related ‘housekeeping’ genes exhibiting developmentally regulated expression of the different isozymes. The developing embryo produces aldolase A, which is produced in even greater amounts in adult muscle where it can be as much as 5% of total cellular protein. In adult liver, kidney and intestine, aldolase A expression is repressed and aldolase B is produced. In brain and other nervous tissue, aldolase A and C are expressed about equally. There is a high degree of homology between aldolase A and C. Defects in ALDOB cause hereditary fructose intolerance. NA
283417 DPY19L2 ENSG00000177990 dpy-19 like 2 The protein encoded by this gene belongs to the dpy-19 family. It is highly expressed in testis, and is required for sperm head elongation and acrosome formation during spermatogenesis. Mutations in this gene are associated with an infertility disorder, spermatogenic failure type 9 (SPGF9). NA
3976 LIF ENSG00000128342 leukemia inhibitory factor The protein encoded by this gene is a pleiotropic cytokine with roles in several different systems. It is involved in the induction of hematopoietic differentiation in normal and myeloid leukemia cells, induction of neuronal cell differentiation, regulator of mesenchymal to epithelial conversion during kidney development, and may also have a role in immune tolerance at the maternal-fetal interface. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
5880 RAC2 ENSG00000128340 ras-related C3 botulinum toxin substrate 2 (rho family, small GTP binding protein Rac2) This gene encodes a member of the Ras superfamily of small guanosine triphosphate (GTP)-metabolizing proteins. The encoded protein localizes to the plasma membrane, where it regulates diverse processes, such as secretion, phagocytosis, and cell polarization. Activity of this protein is also involved in the generation of reactive oxygen species. Mutations in this gene are associated with neutrophil immunodeficiency syndrome. There is a pseudogene for this gene on chromosome 6. NA
55010 PARPBP ENSG00000185480 PARP1 binding protein NA NA
346653 FAM71F2 ENSG00000205085 family with sequence similarity 71 member F2 NA NA
5819 NECTIN2 ENSG00000130202 nectin cell adhesion molecule 2 This gene encodes a single-pass type I membrane glycoprotein with two Ig-like C2-type domains and an Ig-like V-type domain. This protein is one of the plasma membrane components of adherens junctions. It also serves as an entry for certain mutant strains of herpes simplex virus and pseudorabies virus, and it is involved in cell to cell spreading of these viruses. Variations in this gene have been associated with differences in the severity of multiple sclerosis. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. NA
10635 RAD51AP1 ENSG00000111247 RAD51 associated protein 1 NA NA
5347 PLK1 ENSG00000166851 polo like kinase 1 The Ser/Thr protein kinase encoded by this gene belongs to the CDC5/Polo subfamily. It is highly expressed during mitosis and elevated levels are found in many different types of cancer. Depletion of this protein in cancer cells dramatically inhibited cell proliferation and induced apoptosis; hence, it is a target for cancer therapy. NA
816 CAMK2B ENSG00000058404 calcium/calmodulin dependent protein kinase II beta The product of this gene belongs to the serine/threonine protein kinase family and to the Ca(2+)/calmodulin-dependent protein kinase subfamily. Calcium signaling is crucial for several aspects of plasticity at glutamatergic synapses. In mammalian cells, the enzyme is composed of four different chains: alpha, beta, gamma, and delta. The product of this gene is a beta chain. It is possible that distinct isoforms of this chain have different cellular localizations and interact differently with calmodulin. Alternative splicing results in multiple transcript variants. NA
1775 DNASE1L2 ENSG00000167968 deoxyribonuclease I-like 2 NA NA
962 CD48 ENSG00000117091 CD48 molecule This gene encodes a member of the CD2 subfamily of immunoglobulin-like receptors which includes SLAM (signaling lymphocyte activation molecules) proteins. The encoded protein is found on the surface of lymphocytes and other immune cells, dendritic cells and endothelial cells, and participates in activation and differentiation pathways in these cells. The encoded protein does not have a transmembrane domain, however, but is held at the cell surface by a GPI anchor via a C-terminal domain which maybe cleaved to yield a soluble form of the receptor. Multiple transcript variants encoding different isoforms have been found for this gene. NA
56992 KIF15 ENSG00000163808 kinesin family member 15 NA NA
890 CCNA2 ENSG00000145386 cyclin A2 The protein encoded by this gene belongs to the highly conserved cyclin family, whose members are characterized by a dramatic periodicity in protein abundance through the cell cycle. Cyclins function as regulators of CDK kinases. Different cyclins exhibit distinct expression and degradation patterns which contribute to the temporal coordination of each mitotic event. In contrast to cyclin A1, which is present only in germ cells, this cyclin is expressed in all tissues tested. This cyclin binds and activates CDC2 or CDK2 kinases, and thus promotes both cell cycle G1/S and G2/M transitions. NA
249 ALPL ENSG00000162551 alkaline phosphatase, liver/bone/kidney This gene encodes a member of the alkaline phosphatase family of proteins. There are at least four distinct but related alkaline phosphatases: intestinal, placental, placental-like, and liver/bone/kidney (tissue non-specific). The first three are located together on chromosome 2, while the tissue non-specific form is located on chromosome 1. The product of this gene is a membrane bound glycosylated enzyme that is not expressed in any particular tissue and is, therefore, referred to as the tissue-nonspecific form of the enzyme. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature enzyme. This enzyme may play a role in bone mineralization. Mutations in this gene have been linked to hypophosphatasia, a disorder that is characterized by hypercalcemia and skeletal defects. NA
4879 NPPB ENSG00000120937 natriuretic peptide B This gene is a member of the natriuretic peptide family and encodes a secreted protein which functions as a cardiac hormone. The protein undergoes two cleavage events, one within the cell and a second after secretion into the blood. The protein’s biological actions include natriuresis, diuresis, vasorelaxation, inhibition of renin and aldosterone secretion, and a key role in cardiovascular homeostasis. A high concentration of this protein in the bloodstream is indicative of heart failure. The protein also acts as an antimicrobial peptide with antibacterial and antifungal activity. Mutations in this gene have been associated with postmenopausal osteoporosis. NA
1917 EEF1A2 ENSG00000101210 eukaryotic translation elongation factor 1 alpha 2 This gene encodes an isoform of the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This isoform (alpha 2) is expressed in brain, heart and skeletal muscle, and the other isoform (alpha 1) is expressed in brain, placenta, lung, liver, kidney, and pancreas. This gene may be critical in the development of ovarian cancer. NA
4050 LTB ENSG00000227507 lymphotoxin beta Lymphotoxin beta is a type II membrane protein of the TNF family. It anchors lymphotoxin-alpha to the cell surface through heterotrimer formation. The predominant form on the lymphocyte surface is the lymphotoxin-alpha 1/beta 2 complex (e.g. 1 molecule alpha/2 molecules beta) and this complex is the primary ligand for the lymphotoxin-beta receptor. The minor complex is lymphotoxin-alpha 2/beta 1. LTB is an inducer of the inflammatory response system and involved in normal development of lymphoid tissue. Lymphotoxin-beta isoform b is unable to complex with lymphotoxin-alpha suggesting a function for lymphotoxin-beta which is independent of lympyhotoxin-alpha. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
23250 ATP11A ENSG00000068650 ATPase phospholipid transporting 11A The protein encoded by this gene is an integral membrane ATPase. The encoded protein is probably phosphorylated in its intermediate state and likely drives the transport of ions such as calcium across membranes. Two transcript variants encoding different isoforms have been found for this gene. NA
1728 NQO1 ENSG00000181019 NAD(P)H quinone dehydrogenase 1 This gene is a member of the NAD(P)H dehydrogenase (quinone) family and encodes a cytoplasmic 2-electron reductase. This FAD-binding protein forms homodimers and reduces quinones to hydroquinones. This protein’s enzymatic activity prevents the one electron reduction of quinones that results in the production of radical species. Mutations in this gene have been associated with tardive dyskinesia (TD), an increased risk of hematotoxicity after exposure to benzene, and susceptibility to various forms of cancer. Altered expression of this protein has been seen in many tumors and is also associated with Alzheimer’s disease (AD). Alternate transcriptional splice variants, encoding different isoforms, have been characterized. NA
4635 MYL4 ENSG00000198336 myosin light chain 4 Myosin is a hexameric ATPase cellular motor protein. It is composed of two myosin heavy chains, two nonphosphorylatable myosin alkali light chains, and two phosphorylatable myosin regulatory light chains. This gene encodes a myosin alkali light chain that is found in embryonic muscle and adult atria. Two alternatively spliced transcript variants encoding the same protein have been found for this gene. NA
10404 CPQ ENSG00000104324 carboxypeptidase Q This gene encodes a metallopeptidase that belongs to the peptidase M28 family. The encoded protein may catalyze the cleavage of dipeptides with unsubstituted terminals into amino acids. NA
388588 SMIM1 ENSG00000235169 small integral membrane protein 1 (Vel blood group) This gene encodes a small, conserved protein that participates in red blood cell formation. The encoded protein is localized to the cell membrane and is the antigen for the Vel blood group. Alternative splicing results in different transcript variants that encode the same protein. NA
100500808 MIR3917 ENSG00000264021 microRNA 3917 microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop. NA
115123 MARCH3 ENSG00000173926 membrane associated ring-CH-type finger 3 This gene encodes a member of the membrane-associated RING-CH (MARCH) family. The encoded protein is an E3 ubiquitin-protein ligase that may be involved in regulation of the endosomal transport pathway. NA
ENSG00000188985 DHFRP1 ENSG00000188985 dihydrofolate reductase pseudogene 1 NA NA
NA NA ENSG00000237485 NA NA TRUE
80235 PIGZ ENSG00000119227 phosphatidylinositol glycan anchor biosynthesis class Z The glycosylphosphatidylinositol (GPI) anchor is a glycolipid found on many blood cells that serves to anchor proteins to the cell surface. This gene encodes a protein that is localized to the endoplasmic reticulum, and is involved in GPI anchor biosynthesis. As shown for the yeast homolog, which is a member of a family of dolichol-phosphate-mannose (Dol-P-Man)-dependent mannosyltransferases, this protein can also add a side-branching fourth mannose to GPI precursors during the assembly of GPI anchors. NA
79019 CENPM ENSG00000100162 centromere protein M The protein encoded by this gene is an inner protein of the kinetochore, the multi-protein complex that binds spindle microtubules to regulate chromosome segregation during cell division. It belongs to the constitutive centromere-associated network protein group, whose members interact with outer kinetochore proteins and help to maintain centromere identity at each cell division cycle. The protein is structurally related to GTPases but cannot bind guanosine triphosphate. A point mutation that affects interaction with another constitutive centromere-associated network protein, CENP-I, impairs kinetochore assembly and chromosome alignment, suggesting that it is required for kinetochore formation. Alternative splicing results in multiple transcript variants. NA
ENSG00000256663 RP11-424C20.2 ENSG00000256663 NA NA NA
642280 ZNF876P ENSG00000198155 zinc finger protein 876, pseudogene NA NA
7137 TNNI3 ENSG00000129991 troponin I3, cardiac type Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. This gene encodes the TnI-cardiac protein and is exclusively expressed in cardiac muscle tissues. Mutations in this gene cause familial hypertrophic cardiomyopathy type 7 (CMH7) and familial restrictive cardiomyopathy (RCM). NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",4,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 5 Annotations

out <- mygene::queryMany(gene_list[5,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol query X_id name summary notfound
CRNDE ENSG00000245694 ENSG00000245694 colorectal neoplasia differentially expressed (non-protein coding) NA NA
CMTM2 ENSG00000140932 146225 CKLF like MARVEL transmembrane domain containing 2 This gene belongs to the chemokine-like factor gene superfamily, a novel family that links the chemokine and the transmembrane 4 superfamilies of signaling molecules. The protein encoded by this gene may play an important role in testicular development. NA
QPRT ENSG00000103485 23475 quinolinate phosphoribosyltransferase This gene encodes a key enzyme in catabolism of quinolinate, an intermediate in the tryptophan-nicotinamide adenine dinucleotide pathway. Quinolinate acts as a most potent endogenous exitotoxin to neurons. Elevation of quinolinate levels in the brain has been linked to the pathogenesis of neurodegenerative disorders such as epilepsy, Alzheimer’s disease, and Huntington’s disease. Alternative splicing results in multiple transcript variants. NA
MANSC1 ENSG00000111261 54682 MANSC domain containing 1 NA NA
OLAH ENSG00000152463 55301 oleoyl-ACP hydrolase NA NA
BCHE ENSG00000114200 590 butyrylcholinesterase Mutant alleles at the BCHE locus are responsible for suxamethonium sensitivity. Homozygous persons sustain prolonged apnea after administration of the muscle relaxant suxamethonium in connection with surgical anesthesia. The activity of pseudocholinesterase in the serum is low and its substrate behavior is atypical. In the absence of the relaxant, the homozygote is at no known disadvantage. NA
PSD ENSG00000059915 5662 pleckstrin and Sec7 domain containing This gene encodes a Plekstrin homology and SEC7 domains-containing protein that functions as a guanine nucleotide exchange factor. The encoded protein regulates signal transduction by activating ADP-ribosylation factor 6. Alternative splicing results in multiple transcript variants. NA
GALNT14 ENSG00000158089 79623 polypeptide N-acetylgalactosaminyltransferase 14 This gene encodes a Golgi protein which is a member of the polypeptide N-acetylgalactosaminyltransferase (ppGalNAc-Ts) protein family. These enzymes catalyze the transfer of N-acetyl-D-galactosamine (GalNAc) to the hydroxyl groups on serines and threonines in target peptides. The encoded protein has been shown to transfer GalNAc to large proteins like mucins. Multiple transcript variants encoding different isoforms have been found for this gene. NA
APOC1 ENSG00000130208 341 apolipoprotein C1 This gene encodes a member of the apolipoprotein C1 family. This gene is expressed primarily in the liver, and it is activated when monocytes differentiate into macrophages. The encoded protein plays a central role in high density lipoprotein (HDL) and very low density lipoprotein (VLDL) metabolism. This protein has also been shown to inhibit cholesteryl ester transfer protein in plasma. A pseudogene of this gene is located 4 kb downstream in the same orientation, on the same chromosome. This gene is mapped to chromosome 19, where it resides within a apolipoprotein gene cluster. NA
FOXC1 ENSG00000054598 2296 forkhead box C1 This gene belongs to the forkhead family of transcription factors which is characterized by a distinct DNA-binding forkhead domain. The specific function of this gene has not yet been determined; however, it has been shown to play a role in the regulation of embryonic and ocular development. Mutations in this gene cause various glaucoma phenotypes including primary congenital glaucoma, autosomal dominant iridogoniodysgenesis anomaly, and Axenfeld-Rieger anomaly. NA
RTKN ENSG00000114993 6242 rhotekin This gene encodes a scaffold protein that interacts with GTP-bound Rho proteins. Binding of this protein inhibits the GTPase activity of Rho proteins. This protein may interfere with the conversion of active, GTP-bound Rho to the inactive GDP-bound form by RhoGAP. Rho proteins regulate many important cellular processes, including cytokinesis, transcription, smooth muscle contraction, cell growth and transformation. Dysregulation of the Rho signal transduction pathway has been implicated in many forms of cancer. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
SORT1 ENSG00000134243 6272 sortilin 1 This gene encodes a member of the VPS10-related sortilin family of proteins. The encoded preproprotein is proteolytically processed by furin to generate the mature receptor. This receptor plays a role in the trafficking of different proteins to either the cell surface, or subcellular compartments such as lysosomes and endosomes. Expression levels of this gene may influence the risk of myocardial infarction in human patients. Alternative splicing results in multiple transcript variants. NA
CCDC112 ENSG00000164221 153733 coiled-coil domain containing 112 NA NA
KCNAB1 ENSG00000169282 7881 potassium voltage-gated channel subfamily A member regulatory beta subunit 1 Potassium channels represent the most complex class of voltage-gated ion channels from both functional and structural standpoints. Their diverse functions include regulating neurotransmitter release, heart rate, insulin secretion, neuronal excitability, epithelial electrolyte transport, smooth muscle contraction, and cell volume. Four sequence-related potassium channel genes - shaker, shaw, shab, and shal - have been identified in Drosophila, and each has been shown to have human homolog(s). This gene encodes a member of the potassium channel, voltage-gated, shaker-related subfamily. This member includes distinct isoforms which are encoded by alternatively spliced transcript variants of this gene. Some of these isoforms are beta subunits, which form heteromultimeric complexes with alpha subunits and modulate the activity of the pore-forming alpha subunits. NA
RARRES3 ENSG00000133321 5920 retinoic acid receptor responder 3 Retinoids exert biologic effects such as potent growth inhibitory and cell differentiation activities and are used in the treatment of hyperproliferative dermatological diseases. These effects are mediated by specific nuclear receptor proteins that are members of the steroid and thyroid hormone receptor superfamily of transcriptional regulators. RARRES1, RARRES2, and RARRES3 are genes whose expression is upregulated by the synthetic retinoid tazarotene. RARRES3 is thought act as a tumor suppressor or growth regulator. NA
RP3-342P20.2 ENSG00000228477 ENSG00000228477 NA NA NA
DRICH1 ENSG00000189269 51233 aspartate rich 1 NA NA
CAMK2N1 ENSG00000162545 55450 calcium/calmodulin dependent protein kinase II inhibitor 1 NA NA
PLBD1 ENSG00000121316 79887 phospholipase B domain containing 1 NA NA
NA ENSG00000180672 NA NA NA TRUE
AC005339.2 ENSG00000268565 ENSG00000268565 NA NA NA
LIF ENSG00000128342 3976 leukemia inhibitory factor The protein encoded by this gene is a pleiotropic cytokine with roles in several different systems. It is involved in the induction of hematopoietic differentiation in normal and myeloid leukemia cells, induction of neuronal cell differentiation, regulator of mesenchymal to epithelial conversion during kidney development, and may also have a role in immune tolerance at the maternal-fetal interface. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
NRP2 ENSG00000118257 8828 neuropilin 2 This gene encodes a member of the neuropilin family of receptor proteins. The encoded transmembrane protein binds to SEMA3C protein {sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3C} and SEMA3F protein {sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3F}, and interacts with vascular endothelial growth factor (VEGF). This protein may play a role in cardiovascular development, axon guidance, and tumorigenesis. Multiple transcript variants encoding distinct isoforms have been identified for this gene. NA
ACRBP ENSG00000111644 84519 acrosin binding protein The protein encoded by this gene is similar to proacrosin binding protein sp32 precursor found in mouse, guinea pig, and pig. This protein is located in the sperm acrosome and is thought to function as a binding protein to proacrosin for packaging and condensation of the acrosin zymogen in the acrosomal matrix. This protein is a member of the cancer/testis family of antigens and it is found to be immunogenic. In normal tissues, this mRNA is expressed only in testis, whereas it is detected in a range of different tumor types such as bladder, breast, lung, liver, and colon. NA
RP11-1143G9.4 ENSG00000257764 ENSG00000257764 NA NA NA
LYZ ENSG00000090382 4069 lysozyme This gene encodes human lysozyme, whose natural substrate is the bacterial cell wall peptidoglycan (cleaving the beta[1-4]glycosidic linkages between N-acetylmuramic acid and N-acetylglucosamine). Lysozyme is one of the antimicrobial agents found in human milk, and is also present in spleen, lung, kidney, white blood cells, plasma, saliva, and tears. The protein has antibacterial activity against a number of bacterial species. Missense mutations in this gene have been identified in heritable renal amyloidosis. NA
HIC1 ENSG00000177374 3090 hypermethylated in cancer 1 This gene functions as a growth regulatory and tumor repressor gene. Hypermethylation or deletion of the region of this gene have been associated with tumors and the contiguous-gene syndrome, Miller-Dieker syndrome. Alternative splicing of this gene results in multiple transcript variants. NA
PSAT1 ENSG00000135069 29968 phosphoserine aminotransferase 1 This gene encodes a member of the class-V pyridoxal-phosphate-dependent aminotransferase family. The encoded protein is a phosphoserine aminotransferase and decreased expression may be associated with schizophrenia. Mutations in this gene are also associated with phosphoserine aminotransferase deficiency. Alternative splicing results in multiple transcript variants. Pseudogenes of this gene have been defined on chromosomes 1, 3, and 8. NA
PARP14 ENSG00000173193 54625 poly(ADP-ribose) polymerase family member 14 Poly(ADP-ribosyl)ation is an immediate DNA damage-dependent posttranslational modification of histones and other nuclear proteins that contributes to the survival of injured proliferating cells. PARP14 belongs to the superfamily of enzymes that perform this modification (Ame et al., 2004 [PubMed 15273990]). NA
MAML3 ENSG00000196782 55534 mastermind like transcriptional coactivator 3 NA NA
ELN ENSG00000049540 2006 elastin This gene encodes a protein that is one of the two components of elastic fibers. The encoded protein is rich in hydrophobic amino acids such as glycine and proline, which form mobile hydrophobic regions bounded by crosslinks between lysine residues. Deletions and mutations in this gene are associated with supravalvular aortic stenosis (SVAS) and autosomal dominant cutis laxa. Multiple transcript variants encoding different isoforms have been found for this gene. NA
TUBB2B ENSG00000137285 347733 tubulin beta 2B class IIb The protein encoded by this gene is a beta isoform of tubulin, which binds GTP and is a major component of microtubules. This gene is highly similar to TUBB2A and TUBB2C. Defects in this gene are a cause of asymmetric polymicrogyria. NA
SLC17A9 ENSG00000101194 63910 solute carrier family 17 member 9 This gene encodes a member of a family of transmembrane proteins that are involved in the transport of small molecules. The encoded protein participates in the vesicular uptake, storage, and secretion of adenoside triphosphate (ATP) and other nucleotides. A mutation in this gene was found in individuals with autosomal dominant disseminated superficial actinic porokeratosis-8. Alternative splicing results in multiple transcript variants. NA
CEND1 ENSG00000184524 51286 cell cycle exit and neuronal differentiation 1 The protein encoded by this gene is a neuron-specific protein. The similar protein in pig enhances neuroblastoma cell differentiation in vitro and may be involved in neuronal differentiation in vivo. Multiple pseudogenes have been reported for this gene. NA
RP11-327P2.5 ENSG00000231856 ENSG00000231856 NA NA NA
TNFAIP8L1 ENSG00000185361 126282 TNF alpha induced protein 8 like 1 NA NA
OAF ENSG00000184232 220323 out at first homolog NA NA
STMN2 ENSG00000104435 11075 stathmin 2 This gene encodes a member of the stathmin family of phosphoproteins. Stathmin proteins function in microtubule dynamics and signal transduction. The encoded protein plays a regulatory role in neuronal growth and is also thought to be involved in osteogenesis. Reductions in the expression of this gene have been associated with Down’s syndrome and Alzheimer’s disease. Alternatively spliced transcript variants have been observed for this gene. A pseudogene of this gene is located on the long arm of chromosome 6. NA
FLRT1 ENSG00000126500 23769 fibronectin leucine rich transmembrane protein 1 This gene encodes a member of the fibronectin leucine rich transmembrane protein (FLRT) family. The family members may function in cell adhesion and/or receptor signalling. Their protein structures resemble small leucine-rich proteoglycans found in the extracellular matrix. The encoded protein shares sequence similarity with two other family members, FLRT2 and FLRT3. This gene is expressed in kidney and brain. NA
CD22 ENSG00000012124 933 CD22 molecule NA NA
STX3 ENSG00000166900 6809 syntaxin 3 The gene is a member of the syntaxin family. The encoded protein is targeted to the apical membrane of epithelial cells where it forms clusters and is important in establishing and maintaining polarity necessary for protein trafficking involving vesicle fusion and exocytosis. Alternative splicing results in multiple transcript variants. NA
H2AFJ ENSG00000246705 55766 H2A histone family member J Histones are basic nuclear proteins that are responsible for the nucleosome structure of the chromosomal fiber in eukaryotes. Nucleosomes consist of approximately 146 bp of DNA wrapped around a histone octamer composed of pairs of each of the four core histones (H2A, H2B, H3, and H4). The chromatin fiber is further compacted through the interaction of a linker histone, H1, with the DNA between the nucleosomes to form higher order chromatin structures. This gene is located on chromosome 12 and encodes a replication-independent histone that is a variant H2A histone. The protein is divergent at the C-terminus compared to the consensus H2A histone family member. This gene also encodes an antimicrobial peptide with antibacterial and antifungal activity. NA
DERL3 ENSG00000099958 91319 derlin 3 The protein encoded by this gene belongs to the derlin family, and resides in the endoplasmic reticulum (ER). Proteins that are unfolded or misfolded in the ER must be refolded or degraded to maintain the homeostasis of the ER. This protein appears to be involved in the degradation of misfolded glycoproteins in the ER. Several alternatively spliced transcript variants encoding different isoforms have been identified for this gene. NA
TNF ENSG00000232810 7124 tumor necrosis factor This gene encodes a multifunctional proinflammatory cytokine that belongs to the tumor necrosis factor (TNF) superfamily. This cytokine is mainly secreted by macrophages. It can bind to, and thus functions through its receptors TNFRSF1A/TNFR1 and TNFRSF1B/TNFBR. This cytokine is involved in the regulation of a wide spectrum of biological processes including cell proliferation, differentiation, apoptosis, lipid metabolism, and coagulation. This cytokine has been implicated in a variety of diseases, including autoimmune diseases, insulin resistance, and cancer. Knockout studies in mice also suggested the neuroprotective function of this cytokine. NA
ICAM1 ENSG00000090339 3383 intercellular adhesion molecule 1 This gene encodes a cell surface glycoprotein which is typically expressed on endothelial cells and cells of the immune system. It binds to integrins of type CD11a / CD18, or CD11b / CD18 and is also exploited by Rhinovirus as a receptor. NA
CCNJL ENSG00000135083 79616 cyclin J like NA NA
MFSD2A ENSG00000168389 84879 major facilitator superfamily domain containing 2A NA NA
RSPH9 ENSG00000172426 221421 radial spoke head 9 homolog This gene encodes a protein thought to be a component of the radial spoke head in motile cilia and flagella. Mutations in this gene are associated with primary ciliary dyskinesia 12. Alternative splicing results in multiple transcript variants. NA
RP11-532F6.3 ENSG00000272463 ENSG00000272463 NA NA NA
OPRL1 ENSG00000125510 4987 opioid related nociceptin receptor 1 The protein encoded by this gene is a member of the 7 transmembrane-spanning G protein-coupled receptor family, and functions as a receptor for the endogenous, opioid-related neuropeptide, nociceptin/orphanin FQ. This receptor-ligand system modulates a variety of biological functions and neurobehavior, including stress responses and anxiety behavior, learning and memory, locomotor activity, and inflammatory and immune responses. A promoter region between this gene and the 5’-adjacent RGS19 (regulator of G-protein signaling 19) gene on the opposite strand functions bi-directionally as a core-promoter for both genes, suggesting co-operative transcriptional regulation of these two functionally related genes. Alternatively spliced transcript variants have been described for this gene. A recent study provided evidence for translational readthrough in this gene and expression of an additional C-terminally extended isoform via the use of an alternative in-frame translation termination codon. NA
AC114730.2 ENSG00000235151 ENSG00000235151 NA NA NA
AMN1 ENSG00000151743 196394 antagonist of mitotic exit network 1 homolog NA NA
PDXP ENSG00000241360 57026 pyridoxal phosphatase Pyridoxal 5-prime-phosphate (PLP) is the active form of vitamin B6 that acts as a coenzyme in maintaining biochemical homeostasis. The preferred degradation route from PLP to 4-pyridoxic acid involves the dephosphorylation of PLP by PDXP (Jang et al., 2003 [PubMed 14522954]). NA
CPE ENSG00000109472 1363 carboxypeptidase E This gene encodes a member of the M14 family of metallocarboxypeptidases. The encoded preproprotein is proteolytically processed to generate the mature peptidase. This peripheral membrane protein cleaves C-terminal amino acid residues and is involved in the biosynthesis of peptide hormones and neurotransmitters, including insulin. This protein may also function independently of its peptidase activity, as a neurotrophic factor that promotes neuronal survival, and as a sorting receptor that binds to regulated secretory pathway proteins, including prohormones. Mutations in this gene are implicated in type 2 diabetes. NA
RP1-90J20.8 ENSG00000224846 ENSG00000224846 NA NA NA
KB-1572G7.3 ENSG00000211683 ENSG00000211683 NA NA NA
LY6E ENSG00000160932 4061 lymphocyte antigen 6 complex, locus E NA NA
SDS ENSG00000135094 10993 serine dehydratase This gene encodes one of three enzymes that are involved in metabolizing serine and glycine. L-serine dehydratase converts L-serine to pyruvate and ammonia and requires pyridoxal phosphate as a cofactor. The encoded protein can also metabolize threonine to NH4+ and 2-ketobutyrate. The encoded protein is found predominantly in the liver. NA
CTD-2240E14.4 ENSG00000267387 ENSG00000267387 NA NA NA
PPM1H ENSG00000111110 57460 protein phosphatase, Mg2+/Mn2+ dependent 1H NA NA
LOC102723927 ENSG00000237940 102723927 uncharacterized LOC102723927 NA NA
RP11-10C24.1 ENSG00000271020 ENSG00000271020 NA NA NA
DNPH1 ENSG00000112667 10591 2’-deoxynucleoside 5’-phosphate N-hydrolase 1 This gene was identified on the basis of its stimulation by c-Myc protein. The latter is a transcription factor that participates in the regulation of cell proliferation, differentiation, and apoptosis. The exact function of this gene is not known but studies in rat suggest a role in cellular proliferation and c-Myc-mediated transformation. Two alternative transcripts encoding different proteins have been described. NA
AC097724.3 ENSG00000226833 ENSG00000226833 NA NA NA
RP11-21A7A.3 ENSG00000256341 ENSG00000256341 NA NA NA
GBP1 ENSG00000117228 2633 guanylate binding protein 1 Guanylate binding protein expression is induced by interferon. Guanylate binding proteins are characterized by their ability to specifically bind guanine nucleotides (GMP, GDP, and GTP) and are distinguished from the GTP-binding proteins by the presence of 2 binding motifs rather than 3. NA
PIEZO1 ENSG00000103335 9780 piezo type mechanosensitive ion channel component 1 The protein encoded by this gene is a mechanically-activated ion channel that links mechanical forces to biological signals. The encoded protein contains 36 transmembrane domains and functions as a homotetramer. Defects in this gene have been associated with dehydrated hereditary stomatocytosis. NA
CYB5R4 ENSG00000065615 51167 cytochrome b5 reductase 4 NCB5OR is a flavohemoprotein that contains functional domains found in both cytochrome b5 (CYB5A; MIM 613218) and CYB5 reductase (CYB5R3; MIM 613213) (Zhu et al., 1999 [PubMed 10611283]). NA
SERPINI1 ENSG00000163536 5274 serpin family I member 1 This gene encodes a member of the serpin superfamily of serine proteinase inhibitors. The protein is primarily secreted by axons in the brain, and preferentially reacts with and inhibits tissue-type plasminogen activator. It is thought to play a role in the regulation of axonal growth and the development of synaptic plasticity. Mutations in this gene result in familial encephalopathy with neuroserpin inclusion bodies (FENIB), which is a dominantly inherited form of familial encephalopathy and epilepsy characterized by the accumulation of mutant neuroserpin polymers. Multiple alternatively spliced variants, encoding the same protein, have been identified. NA
NA ENSG00000175898 NA NA NA TRUE
GALNT7 ENSG00000109586 51809 polypeptide N-acetylgalactosaminyltransferase 7 This gene encodes GalNAc transferase 7, a member of the GalNAc-transferase family. The enzyme encoded by this gene controls the initiation step of mucin-type O-linked protein glycosylation and transfer of N-acetylgalactosamine to serine and threonine amino acid residues. This enzyme is a type II transmembrane protein and shares common sequence motifs with other family members. Unlike other family members, this enzyme shows exclusive specificity for partially GalNAc-glycosylated acceptor substrates and shows no activity with non-glycosylated peptides. This protein may function as a follow-up enzyme in the initiation step of O-glycosylation. NA
LTBP2 ENSG00000119681 4053 latent transforming growth factor beta binding protein 2 The protein encoded by this gene belongs to the family of latent transforming growth factor (TGF)-beta binding proteins (LTBP), which are extracellular matrix proteins with multi-domain structure. This protein is the largest member of the LTBP family possessing unique regions and with most similarity to the fibrillins. It has thus been suggested that it may have multiple functions: as a member of the TGF-beta latent complex, as a structural component of microfibrils, and a role in cell adhesion. NA
MEGF6 ENSG00000162591 1953 multiple EGF like domains 6 NA NA
NA ENSG00000232222 NA NA NA TRUE
RP11-798M19.3 ENSG00000248774 ENSG00000248774 NA NA NA
RP11-799B12.2 ENSG00000264924 ENSG00000264924 NA NA NA
PLXDC2 ENSG00000120594 84898 plexin domain containing 2 NA NA
ITM2B ENSG00000136156 9445 integral membrane protein 2B Amyloid precursor proteins are processed by beta-secretase and gamma-secretase to produce beta-amyloid peptides which form the characteristic plaques of Alzheimer disease. This gene encodes a transmembrane protein which is processed at the C-terminus by furin or furin-like proteases to produce a small secreted peptide which inhibits the deposition of beta-amyloid. Mutations which result in extension of the C-terminal end of the encoded protein, thereby increasing the size of the secreted peptide, are associated with two neurogenerative diseases, familial British dementia and familial Danish dementia. NA
RP11-169K16.4 ENSG00000224459 ENSG00000224459 NA NA NA
GCA ENSG00000115271 25801 grancalcin This gene product, grancalcin, is a calcium-binding protein abundant in neutrophils and macrophages. It belongs to the penta-EF-hand subfamily of proteins which includes sorcin, calpain, and ALG-2. Grancalcin localization is dependent upon calcium and magnesium. In the absence of divalent cation, grancalcin localizes to the cytosolic fraction; with magnesium alone, it partitions with the granule fraction; and in the presence of magnesium and calcium, it associates with both the granule and membrane fractions, suggesting a role for grancalcin in granule-membrane fusion and degranulation. NA
MFGE8 ENSG00000140545 4240 milk fat globule-EGF factor 8 protein This gene encodes a preproprotein that is proteolytically processed to form multiple protein products. The major encoded protein product, lactadherin, is a membrane glycoprotein that promotes phagocytosis of apoptotic cells. This protein has also been implicated in wound healing, autoimmune disease, and cancer. Lactadherin can be further processed to form a smaller cleavage product, medin, which comprises the major protein component of aortic medial amyloid (AMA). Alternative splicing results in multiple transcript variants. NA
EGR2 ENSG00000122877 1959 early growth response 2 The protein encoded by this gene is a transcription factor with three tandem C2H2-type zinc fingers. Defects in this gene are associated with Charcot-Marie-Tooth disease type 1D (CMT1D), Charcot-Marie-Tooth disease type 4E (CMT4E), and with Dejerine-Sottas syndrome (DSS). Multiple transcript variants encoding two different isoforms have been found for this gene. NA
RP11-360F5.3 ENSG00000249685 ENSG00000249685 NA NA NA
NOV ENSG00000136999 4856 nephroblastoma overexpressed The protein encoded by this gene is a small secreted cysteine-rich protein and a member of the CCN family of regulatory proteins. CNN family proteins associate with the extracellular matrix and play an important role in cardiovascular and skeletal development, fibrosis and cancer development. NA
VAV2 ENSG00000160293 7410 vav guanine nucleotide exchange factor 2 VAV2 is the second member of the VAV guanine nucleotide exchange factor family of oncogenes. Unlike VAV1, which is expressed exclusively in hematopoietic cells, VAV2 transcripts were found in most tissues. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
RP11-323N12.5 ENSG00000267601 ENSG00000267601 NA NA NA
NKD2 ENSG00000145506 85409 naked cuticle homolog 2 This gene encodes a member of a family of proteins that function as negative regulators of Wnt receptor signaling through interaction with Dishevelled family members. The encoded protein participates in the delivery of transforming growth factor alpha-containing vesicles to the cell membrane. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
DYX1C1-CCPG1 ENSG00000261771 100533483 DYX1C1-CCPG1 readthrough (NMD candidate) This locus represents naturally occurring read-through transcription between the neighboring dyslexia susceptibility 1 candidate 1 (DYX1C1) and cell cycle progression 1 (CCPG1) genes on chromosome 15. The read-through transcript is a candidate for nonsense-mediated mRNA decay (NMD), and is thus unlikely to produce a protein product. NA
CTC-338M12.6 ENSG00000250900 ENSG00000250900 NA NA NA
ANK1 ENSG00000029534 286 ankyrin 1 Ankyrins are a family of proteins that link the integral membrane proteins to the underlying spectrin-actin cytoskeleton and play key roles in activities such as cell motility, activation, proliferation, contact and the maintenance of specialized membrane domains. Multiple isoforms of ankyrin with different affinities for various target proteins are expressed in a tissue-specific, developmentally regulated manner. Most ankyrins are typically composed of three structural domains: an amino-terminal domain containing multiple ankyrin repeats; a central region with a highly conserved spectrin binding domain; and a carboxy-terminal regulatory domain which is the least conserved and subject to variation. Ankyrin 1, the prototype of this family, was first discovered in the erythrocytes, but since has also been found in brain and muscles. Mutations in erythrocytic ankyrin 1 have been associated in approximately half of all patients with hereditary spherocytosis. Complex patterns of alternative splicing in the regulatory domain, giving rise to different isoforms of ankyrin 1 have been described. Truncated muscle-specific isoforms of ankyrin 1 resulting from usage of an alternate promoter have also been identified. NA
GALNT3 ENSG00000115339 2591 polypeptide N-acetylgalactosaminyltransferase 3 This gene encodes UDP-GalNAc transferase 3, a member of the GalNAc-transferases family. This family transfers an N-acetyl galactosamine to the hydroxyl group of a serine or threonine residue in the first step of O-linked oligosaccharide biosynthesis. Individual GalNAc-transferases have distinct activities and initiation of O-glycosylation is regulated by a repertoire of GalNAc-transferases. The protein encoded by this gene is highly homologous to other family members, however the enzymes have different substrate specificities. NA
TRIM14 ENSG00000106785 9830 tripartite motif containing 14 The protein encoded by this gene is a member of the tripartite motif (TRIM) family. The TRIM motif includes three zinc-binding domains, a RING, a B-box type 1 and a B-box type 2, and a coiled-coil region. The protein localizes to cytoplasmic bodies and its function has not been determined. Alternative splicing results in multiple transcript variants. NA
CTB-51J22.1 ENSG00000232415 ENSG00000232415 NA NA NA
STARD13 ENSG00000133121 90627 StAR related lipid transfer domain containing 13 This gene encodes a protein which contains an N-terminal sterile alpha motif (SAM) for protein-protein interactions, followed by an ATP/GTP-binding motif, a GTPase-activating protein (GAP) domain, and a C-terminal STAR-related lipid transfer (START) domain. It may be involved in regulation of cytoskeletal reorganization, cell proliferation, and cell motility, and acts as a tumor suppressor in hepatoma cells. The gene is located in a region of chromosome 13 that is associated with loss of heterozygosity in hepatocellular carcinomas. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. NA
RP11-10C24.3 ENSG00000271643 ENSG00000271643 NA NA NA
SERPINB6 ENSG00000124570 5269 serpin family B member 6 The protein encoded by this gene is a member of the serpin (serine proteinase inhibitor) superfamily, and ovalbumin(ov)-serpin subfamily. It was originally discovered as a placental thrombin inhibitor. The mouse homolog was found to be expressed in the hair cells of the inner ear. Mutations in this gene are associated with nonsyndromic progressive hearing loss, suggesting that this serpin plays an important role in the inner ear in the protection against leakage of lysosomal content during stress, and that loss of this protection results in cell death and sensorineural hearing loss. Alternatively spliced transcript variants have been found for this gene. NA
S100A11 ENSG00000163191 6282 S100 calcium binding protein A11 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in motility, invasion, and tubulin polymerization. Chromosomal rearrangements and altered expression of this gene have been implicated in tumor metastasis. NA
RP5-1142A6.9 ENSG00000260121 ENSG00000260121 NA NA NA
MMP17 ENSG00000198598 4326 matrix metallopeptidase 17 This gene encodes a member of the peptidase M10 family and membrane-type subfamily of matrix metalloproteinases (MMPs). Proteins in this family are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and metastasis. Members of this subfamily contain a transmembrane domain suggesting that these proteins are expressed at the cell surface rather than secreted. The encoded preproprotein is proteolytically processed to generate the mature protease. This protein is unique among the membrane-type matrix metalloproteinases in that it is anchored to the cell membrane via a glycosylphosphatidylinositol (GPI) anchor. Elevated expression of the encoded protein has been observed in osteoarthritis and multiple human cancers. NA
IGF1R ENSG00000140443 3480 insulin like growth factor 1 receptor This receptor binds insulin-like growth factor with a high affinity. It has tyrosine kinase activity. The insulin-like growth factor I receptor plays a critical role in transformation events. Cleavage of the precursor generates alpha and beta subunits. It is highly overexpressed in most malignant tissues where it functions as an anti-apoptotic agent by enhancing cell survival. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",5,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 6 Annotations

out <- mygene::queryMany(gene_list[6,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary query name X_id symbol notfound
The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. The expression of this gene is restricted to small intestine, colon, and rectum, and it is underexpressed in colorectal cancer. ENSG00000171747 galectin 4 3960 LGALS4 NA
NA ENSG00000165862 NA NA NA TRUE
The protein encoded by this gene belongs to the ZIP family of zinc transporters that transport zinc into cells from outside, and play a crucial role in controlling intracellular zinc levels. Zinc is an essential cofactor for many enzymes and proteins involved in gene transcription, growth, development and differentiation. Mutations in this gene have been associated with autosomal dominant high myopia (MYP24). Alternatively spliced transcript variants have been found for this gene. ENSG00000139540 solute carrier family 39 member 5 283375 SLC39A5 NA
This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. ENSG00000115386 regenerating family member 1 alpha 5967 REG1A NA
This gene encodes a member of the steroid-thyroid hormone-retinoid receptor superfamily. The encoded protein may act as a transcriptional activator. The protein can efficiently bind the NGFI-B Response Element (NBRE). Three different versions of extraskeletal myxoid chondrosarcomas (EMCs) are the result of reciprocal translocations between this gene and other genes. The translocation breakpoints are associated with Nuclear Receptor Subfamily 4, Group A, Member 3 (on chromosome 9) and either Ewing Sarcome Breakpoint Region 1 (on chromosome 22), RNA Polymerase II, TATA Box-Binding Protein-Associated Factor, 68-KD (on chromosome 17), or Transcription factor 12 (on chromosome 15). Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000119508 nuclear receptor subfamily 4 group A member 3 8013 NR4A3 NA
This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is expressed in the brain and pancreas and is resistant to common trypsin inhibitors. It is active on peptide linkages involving the carboxyl group of lysine or arginine. This gene is localized to the locus of T cell receptor beta variable orphans on chromosome 9. Four transcript variants encoding different isoforms have been described for this gene. ENSG00000010438 protease, serine 3 5646 PRSS3 NA
This gene encodes a protein precursor of the digestive enzyme pepsin, a member of the peptidase A1 family of endopeptidases. The encoded precursor is secreted by gastric chief cells and undergoes autocatalytic cleavage in acidic conditions to form the active enzyme, which functions in the digestion of dietary proteins. This gene is found in a cluster of related genes on chromosome 11, each of which encodes one of multiple pepsinogens. Pepsinogen levels in serum may serve as a biomarker for atrophic gastritis and gastric cancer. ENSG00000229859 pepsinogen 3, group I (pepsinogen A) 643834 PGA3 NA
The protein encoded by this gene belongs to the ‘regulator of G protein signaling’ family. It inhibits signal transduction by increasing the GTPase activity of G protein alpha subunits. It also may play a role in regulating the kinetics of signaling in the phototransduction cascade. ENSG00000143333 regulator of G-protein signaling 16 6004 RGS16 NA
This gene encodes a member of the muscle segment homeobox gene family. The encoded protein functions as a transcriptional repressor during embryogenesis through interactions with components of the core transcription complex and other homeoproteins. It may also have roles in limb-pattern formation, craniofacial development, particularly odontogenesis, and tumor growth inhibition. Mutations in this gene, which was once known as homeobox 7, have been associated with nonsyndromic cleft lip with or without cleft palate 5, Witkop syndrome, Wolf-Hirschom syndrome, and autosomoal dominant hypodontia. ENSG00000163132 msh homeobox 1 4487 MSX1 NA
This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. ENSG00000172023 regenerating family member 1 beta 5968 REG1B NA
The protein encoded by this gene is a trypsin inhibitor, which is secreted from pancreatic acinar cells into pancreatic juice. It is thought to function in the prevention of trypsin-catalyzed premature activation of zymogens within the pancreas and the pancreatic duct. Mutations in this gene are associated with hereditary pancreatitis and tropical calcific pancreatitis. ENSG00000164266 serine peptidase inhibitor, Kazal type 1 6690 SPINK1 NA
This gene encodes a pancreatic secretory protein that may be involved in cell proliferation or differentiation. It has similarity to the C-type lectin superfamily. The enhanced expression of this gene is observed during pancreatic inflammation and liver carcinogenesis. The mature protein also functions as an antimicrobial protein with antibacterial activity. Alternate splicing results in multiple transcript variants that encode the same protein. ENSG00000172016 regenerating family member 3 alpha 5068 REG3A NA
This gene encodes a member of the syntaxin family. Syntaxins have been implicated in the targeting and fusion of intracellular transport vesicles. This family member may regulate protein transport among late endosomes and the trans-Golgi network. Mutations in this gene have been associated with familial hemophagocytic lymphohistiocytosis. ENSG00000135604 syntaxin 11 8676 STX11 NA
This gene encodes a protein containing coiled-coil domains. The encoded protein functions in outer dynein arm assembly and is required for motile cilia function. Mutations in this gene result in primary ciliary dyskinesia. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000198003 coiled-coil domain containing 151 115948 CCDC151 NA
NA ENSG00000139572 G protein-coupled receptor 84 53831 GPR84 NA
NA ENSG00000157315 transmembrane p24 trafficking protein 6 146456 TMED6 NA
This gene encodes a type I membrane glycoprotein containing two extracellular immunoglobulin domains, a transmembrane and a cytoplasmic domain. This gene is expressed by various cell types, including B cells, a subset of T cells, thymocytes, endothelial cells, and neurons. The encoded protein plays an important role in immunosuppression and regulation of anti-tumor activity. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000091972 CD200 molecule 4345 CD200 NA
NA ENSG00000211454 aldo-keto reductase family 7-like (gene/pseudogene) ENSG00000211454 AKR7L NA
NA ENSG00000108932 solute carrier family 16 member 6 9120 SLC16A6 NA
Aldo-keto reductases, such as AKR7A3, are involved in the detoxification of aldehydes and ketones. ENSG00000162482 aldo-keto reductase family 7 member A3 22977 AKR7A3 NA
NA ENSG00000131094 complement component 1, q subcomponent-like 1 10882 C1QL1 NA
This gene encodes a member of the low density lipoprotein receptor (LDLR) family. Low density lipoprotein receptors are cell surface proteins that play roles in both signal transduction and receptor-mediated endocytosis of specific ligands for lysosomal degradation. The encoded protein plays a critical role in the migration of neurons during development by mediating Reelin signaling, and also functions as a receptor for the cholesterol transport protein apolipoprotein E. Expression of this gene may be a marker for major depressive disorder. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000157193 LDL receptor related protein 8 7804 LRP8 NA
This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. The encoded protein metabolizes drugs as well as the steroid hormones testosterone and progesterone. This gene is part of a cluster of cytochrome P450 genes on chromosome 7q21.1. Two pseudogenes of this gene have been identified within this cluster on chromosome 7. Expression of this gene is widely variable among populations, and a single nucleotide polymorphism that affects transcript splicing has been associated with susceptibility to hypertensions. Alternative splicing results in multiple transcript variants. ENSG00000106258 cytochrome P450 family 3 subfamily A member 5 1577 CYP3A5 NA
The protein encoded by this gene is found to be down-regulated in human gastric cancer tissue as compared to normal gastric mucosa. ENSG00000169605 gastrokine 1 56287 GKN1 NA
Metabolic N-oxidation of the diet-derived amino-trimethylamine (TMA) is mediated by flavin-containing monooxygenase and is subject to an inherited FMO3 polymorphism in man resulting in a small subpopulation with reduced TMA N-oxidation capacity resulting in fish odor syndrome Trimethylaminuria. Three forms of the enzyme, FMO1 found in fetal liver, FMO2 found in adult liver, and FMO3 are encoded by genes clustered in the 1q23-q25 region. Flavin-containing monooxygenases are NADPH-dependent flavoenzymes that catalyzes the oxidation of soft nucleophilic heteroatom centers in drugs, pesticides, and xenobiotics. Alternative splicing results in multiple transcript variants. ENSG00000131781 flavin containing monooxygenase 5 2330 FMO5 NA
NA ENSG00000244124 ATP1B3 antisense RNA 1 ENSG00000244124 ATP1B3-AS1 NA
NA ENSG00000230701 F-box and WD repeat domain containing 4 pseudogene 1 26226 FBXW4P1 NA
This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. ENSG00000169347 glycoprotein 2 2813 GP2 NA
This gene encodes gastric lipase, an enzyme involved in the digestion of dietary triglycerides in the gastrointestinal tract, and responsible for 30% of fat digestion processes occurring in human. It is secreted by gastric chief cells in the fundic mucosa of the stomach, and it hydrolyzes the ester bonds of triglycerides under acidic pH conditions. The gene is a member of a conserved gene family of lipases that play distinct roles in neutral lipid metabolism. Several transcript variants encoding different isoforms have been found for this gene. ENSG00000182333 lipase F, gastric type 8513 LIPF NA
NA ENSG00000214193 SH3 domain containing 21 79729 SH3D21 NA
This gene was originally cloned from human myeloblasts and found to be selectively expressed in inflammed colonic epithelium. This gene encodes a member of the olfactomedin family. The encoded protein is an antiapoptotic factor that promotes tumor growth and is an extracellular matrix glycoprotein that facilitates cell adhesion. ENSG00000102837 olfactomedin 4 10562 OLFM4 NA
NA ENSG00000172738 transmembrane protein 217 221468 TMEM217 NA
NA ENSG00000099625 CACN beta subunit associated regulatory protein 255057 CBARP NA
This gene encodes a member of the Ras superfamily of small GTPases and is induced by dexamethasone. The encoded protein is an activator of G-protein signaling and acts as a direct nucleotide exchange factor for Gi-Go proteins. This protein interacts with the neuronal nitric oxide adaptor protein CAPON, and a nuclear adaptor protein FE65, which interacts with the Alzheimer’s disease amyloid precursor protein. This gene may play a role in dexamethasone-induced alterations in cell morphology, growth and cell-extracellular matrix interactions. Epigenetic inactivation of this gene is closely correlated with resistance to dexamethasone in multiple myeloma cells. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. ENSG00000108551 ras related dexamethasone induced 1 51655 RASD1 NA
The protein encoded by this gene belongs to the innexin family. Innexin family members are the structural components of gap junctions. This protein and pannexin 1 are abundantly expressed in central nervous system (CNS) and are coexpressed in various neuronal populations. Studies in Xenopus oocytes suggest that this protein alone and in combination with pannexin 1 may form cell type-specific gap junctions with distinct properties. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000073150 pannexin 2 56666 PANX2 NA
NA ENSG00000010282 hedgehog acyltransferase-like 57467 HHATL NA
This protein encoded by this gene belongs to the WD repeat-containing family of proteins, which function in the formation of protein-protein complexes in a variety of biological pathways. This family member appears to function in the determination of mean platelet volume (MPV), and polymorphisms in this gene have been associated with variance in MPV. Alternative splicing of this gene results in multiple transcript variants. ENSG00000158023 WD repeat domain 66 144406 WDR66 NA
NA ENSG00000237188 NA ENSG00000237188 RP11-337C18.8 NA
NA ENSG00000149564 endothelial cell adhesion molecule 90952 ESAM NA
This gene encodes a member of the steroid-thyroid hormone-retinoid receptor superfamily. Expression is induced by phytohemagglutinin in human lymphocytes and by serum stimulation of arrested fibroblasts. The encoded protein acts as a nuclear transcription factor. Translocation of the protein from the nucleus to mitochondria induces apoptosis. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000123358 nuclear receptor subfamily 4 group A member 1 3164 NR4A1 NA
The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. Several transcript variants encoding different isoforms have been found for this gene. ENSG00000175592 FOS like 1, AP-1 transcription factor subunit 8061 FOSL1 NA
This gene encodes a member of the selenium-binding protein family. Selenium is an essential nutrient that exhibits potent anticarcinogenic properties, and deficiency of selenium may cause certain neurologic diseases. The effects of selenium in preventing cancer and neurologic diseases may be mediated by selenium-binding proteins, and decreased expression of this gene may be associated with several types of cancer. The encoded protein may play a selenium-dependent role in ubiquitination/deubiquitination-mediated protein degradation. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ENSG00000143416 selenium binding protein 1 8991 SELENBP1 NA
NA ENSG00000108187 phenazine biosynthesis like protein domain containing 64081 PBLD NA
This gene encodes a zinc finger protein containing a KRAB (Kruppel-associated box) domain found in transcriptional repressors. This gene may be methylated and silenced in cancer cells. This gene is located within a differentially methylated region (DMR) and shows allele-specific expression in placenta. Alternative splicing and the use of alternative promoters results in multiple transcript variants encoding the same protein. ENSG00000130844 zinc finger protein 331 55422 ZNF331 NA
The protein encoded by this gene has substantial phospholipase activity and may be involved in lipoprotein metabolism and vascular biology. This protein is designated a member of the TG lipase family by its sequence and characteristic lid region which provides substrate specificity for enzymes of the TG lipase family. ENSG00000101670 lipase G, endothelial type 9388 LIPG NA
NA ENSG00000230280 heterogeneous nuclear ribonucleoprotein A1 pseudogene 59 ENSG00000230280 HNRNPA1P59 NA
This gene encodes a member of the semaphorin family of soluble and transmembrane proteins. Semaphorins are involved in numerous functions, including axon guidance, morphogenesis, carcinogenesis, and immunomodulation. The encoded protein is a single-pass type I membrane protein containing an immunoglobulin-like C2-type domain, a PSI domain and a sema domain. It inhibits axonal extension by providing local signals to specify territories inaccessible for growing axons. It is an activator of T-cell-mediated immunity and suppresses vascular endothelial growth factor (VEGF)-mediated endothelial cell migration and proliferation in vitro and angiogenesis in vivo. Mutations in this gene are associated with retinal degenerative diseases including retinitis pigmentosa type 35 (RP35) and cone-rod dystrophy type 10 (CORD10). Multiple alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000196189 semaphorin 4A 64218 SEMA4A NA
This gene encodes a serine/threonine protein kinase. Although this gene product is similar to serum- and glucocorticoid-induced protein kinase (SGK), this gene is not induced by serum or glucocorticoids. This gene is induced in response to signals that activate phosphatidylinositol 3-kinase, which is also true for SGK. Alternative splicing results in multiple transcript variants. ENSG00000101049 SGK2, serine/threonine kinase 2 10110 SGK2 NA
NA ENSG00000271769 NA NA NA TRUE
NA ENSG00000250606 NA NA NA TRUE
The protein encoded by this gene is a member of the fibroblast growth factor (FGF) family. FGF family members possess broad mitogenic and cell survival activities, and are involved in a variety of biological processes, including embryonic development, cell growth, morphogenesis, tissue repair, tumor growth and invasion. The function of this gene has not yet been determined. The expression pattern of the mouse homolog implies a role in nervous system development. Alternative splicing results in multiple transcript variants. ENSG00000161958 fibroblast growth factor 11 2256 FGF11 NA
The protein encoded by this gene is a homeodomain protein that lacks certain conserved residues required for DNA binding. It was reported that choriocarcinoma cell lines and tissues failed to express this gene, which suggested the possible involvement of this gene in malignant conversion of placental trophoblasts. Studies in mice suggest that this protein may interact with serum response factor (SRF) and modulate SRF-dependent cardiac-specific gene expression and cardiac development. Multiple alternatively spliced transcript variants have been identified for this gene. ENSG00000171476 HOP homeobox 84525 HOPX NA
NA ENSG00000226445 uncharacterized LOC101929523 101929523 LOC101929523 NA
NA ENSG00000168490 phytanoyl-CoA 2-hydroxylase interacting protein 9796 PHYHIP NA
Three different forms of human pancreatic procarboxypeptidase A have been isolated. The encoded protein represents the A2 form, which is a monomeric protein with different biochemical properties from the A1 and A3 forms. The A2 form of pancreatic procarboxypeptidase acts on aromatic C-terminal residues and is a secreted protein. ENSG00000158516 carboxypeptidase A2 1358 CPA2 NA
NA ENSG00000135245 hypoxia inducible lipid droplet associated 29923 HILPDA NA
This gene functions in the regulation of autophagy, a lysosomal degradation pathway. This gene also functions as an antisense transcript in the posttranscriptional regulation of the endothelial nitric oxide synthase 3 gene, which has 3’ overlap with this gene on the opposite strand. Mutations in this gene and disruption of the autophagy process have been associated with multiple cancers. Alternative splicing results in multiple transcript variants. ENSG00000181652 autophagy related 9B 285973 ATG9B NA
NA ENSG00000261504 uncharacterized LOC284648 284648 LOC284648 NA
The protein encoded by this gene contains a HMG box DNA binding domain. HMG boxes are found in many eukaryotic proteins involved in chromatin assembly, transcription and replication. This protein may function to regulate T-cell development. ENSG00000198846 thymocyte selection associated high mobility group box 9760 TOX NA
This gene encodes a member of the cysteine-aspartic acid protease (caspase) family of enzymes. Sequential activation of caspases plays a central role in the execution-phase of cell apoptosis. Caspases exist as inactive proenzymes which undergo proteolytic processing at conserved aspartic acid residues to produce two subunits, large and small, that dimerize to form the active enzyme. This protein is processed by caspases 7, 8 and 10, and is thought to function as a downstream enzyme in the caspase activation cascade. Alternative splicing of this gene results in multiple transcript variants that encode different isoforms. ENSG00000138794 caspase 6 839 CASP6 NA
This gene encodes a basic helix-loop-helix protein expressed in various tissues. The encoded protein can interact with ARNTL or compete for E-box binding sites in the promoter of PER1 and repress CLOCK/ARNTL’s transactivation of PER1. This gene is believed to be involved in the control of circadian rhythm and cell differentiation. ENSG00000134107 basic helix-loop-helix family member e40 8553 BHLHE40 NA
The protein encoded by this gene belongs to the P2X family of G-protein-coupled receptors. These proteins can form homo-and heterotimers and function as ATP-gated ion channels and mediate rapid and selective permeability to cations. This protein is primarily localized to smooth muscle where binds ATP and mediates synaptic transmission between neurons and from neurons to smooth muscle and may being responsible for sympathetic vasoconstriction in small arteries, arterioles and vas deferens. Mouse studies suggest that this receptor is essential for normal male reproductive function. This protein may also be involved in promoting apoptosis. ENSG00000108405 purinergic receptor P2X 1 5023 P2RX1 NA
NA ENSG00000129007 calmodulin like 4 91860 CALML4 NA
The protein encoded by this gene belongs to the class-3 semaphorin/collapsin family, whose members function in growth cone guidance during neuronal development. This family member inhibits axonal extension and has been shown to act as a tumor suppressor by inducing apoptosis. Alternative splicing of this gene results in multiple transcript variants. ENSG00000012171 semaphorin 3B 7869 SEMA3B NA
The protein encoded by this intronless gene is an endothelial-specific type I membrane receptor that binds thrombin. This binding results in the activation of protein C, which degrades clotting factors Va and VIIIa and reduces the amount of thrombin generated. Mutations in this gene are a cause of thromboembolic disease, also known as inherited thrombophilia. ENSG00000178726 thrombomodulin 7056 THBD NA
NA ENSG00000163995 actin binding LIM protein family member 2 84448 ABLIM2 NA
NA ENSG00000164620 RELT like 2 285613 RELL2 NA
Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. ENSG00000042832 thyroglobulin 7038 TG NA
NA ENSG00000198429 zinc finger protein 69 7620 ZNF69 NA
The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. Differential and in situ hybridization studies indicate that this lectin is specifically expressed in keratinocytes and found mainly in stratified squamous epithelium. A duplicate copy of this gene (GeneID:3963) is found adjacent to, but on the opposite strand on chromosome 19. ENSG00000178934 galectin 7B 653499 LGALS7B NA
NA ENSG00000259539 NA ENSG00000259539 CTD-2651B20.1 NA
NA ENSG00000234043 NA ENSG00000234043 RP11-56M3.1 NA
This gene encodes a protein that shares sequence similarity to nucleosome assembly factors, but may be localized to the cytoplasm rather than the nucleus. Expression of this gene is downregulated in hepatocellular carcinomas. This gene is located within a differentially methylated region (DMR) and is imprinted and paternally expressed. There is a related pseudogene on chromosome 4. ENSG00000177432 nucleosome assembly protein 1 like 5 266812 NAP1L5 NA
This gene encodes a member of the stathmin family of phosphoproteins. Stathmin proteins function in microtubule dynamics and signal transduction. The encoded protein plays a regulatory role in neuronal growth and is also thought to be involved in osteogenesis. Reductions in the expression of this gene have been associated with Down’s syndrome and Alzheimer’s disease. Alternatively spliced transcript variants have been observed for this gene. A pseudogene of this gene is located on the long arm of chromosome 6. ENSG00000104435 stathmin 2 11075 STMN2 NA
This gene encodes a growth factor found in placenta which is homologous to vascular endothelial growth factor. Alternatively spliced transcripts encoding different isoforms have been found for this gene. ENSG00000119630 placental growth factor 5228 PGF NA
Fructose-1,6-bisphosphate aldolase (EC 4.1.2.13) is a tetrameric glycolytic enzyme that catalyzes the reversible conversion of fructose-1,6-bisphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. Vertebrates have 3 aldolase isozymes which are distinguished by their electrophoretic and catalytic properties. Differences indicate that aldolases A, B, and C are distinct proteins, the products of a family of related ‘housekeeping’ genes exhibiting developmentally regulated expression of the different isozymes. The developing embryo produces aldolase A, which is produced in even greater amounts in adult muscle where it can be as much as 5% of total cellular protein. In adult liver, kidney and intestine, aldolase A expression is repressed and aldolase B is produced. In brain and other nervous tissue, aldolase A and C are expressed about equally. There is a high degree of homology between aldolase A and C. Defects in ALDOB cause hereditary fructose intolerance. ENSG00000136872 aldolase, fructose-bisphosphate B 229 ALDOB NA
This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. ENSG00000204983 protease, serine 1 5644 PRSS1 NA
NA ENSG00000267274 NA ENSG00000267274 CTD-2006C1.12 NA
This gene encodes a member of the chondroitin N-acetylgalactosaminyltransferase family. These enzymes possess dual glucuronyltransferase and galactosaminyltransferase activity and play critical roles in the biosynthesis of chondroitin sulfate, a glycosaminoglycan involved in many biological processes including cell proliferation and morphogenesis. Decreased expression of this gene may play a role in colorectal cancer, and mutations in this gene are a cause of temtamy preaxial brachydactyly syndrome. ENSG00000131873 chondroitin sulfate synthase 1 22856 CHSY1 NA
NA ENSG00000132465 joining chain of multimeric IgA and IgM 3512 JCHAIN NA
NA ENSG00000203886 CYP17A1 antisense RNA 1 102724307 CYP17A1-AS1 NA
G protein-coupled receptors (GPCRs) play key roles in a variety of physiologic functions. Members of the leucine-rich GPCR (LGR) family, such as GPR48, have multiple N-terminal leucine-rich repeats (LRRs) and a 7-transmembrane domain (Weng et al., 2008 [PubMed 18424556]). ENSG00000205213 leucine rich repeat containing G protein-coupled receptor 4 55366 LGR4 NA
Zinc finger proteins, such as ZNF385A, are regulatory proteins that act as transcription factors, bind single- or double-stranded RNA, or interact with other proteins (Sharma et al., 2004 [PubMed 15527981]). ENSG00000161642 zinc finger protein 385A 25946 ZNF385A NA
NA ENSG00000240758 NA ENSG00000240758 RP11-155G14.6 NA
HLA-DRB1 belongs to the HLA class II beta chain paralogs. The class II molecule is a heterodimer consisting of an alpha (DRA) and a beta chain (DRB), both anchored in the membrane. It plays a central role in the immune system by presenting peptides derived from extracellular proteins. Class II molecules are expressed in antigen presenting cells (APC: B lymphocytes, dendritic cells, macrophages). The beta chain is approximately 26-28 kDa. It is encoded by 6 exons. Exon one encodes the leader peptide; exons 2 and 3 encode the two extracellular domains; exon 4 encodes the transmembrane domain; and exon 5 encodes the cytoplasmic tail. Within the DR molecule the beta chain contains all the polymorphisms specifying the peptide binding specificities. Hundreds of DRB1 alleles have been described and typing for these polymorphisms is routinely done for bone marrow and kidney transplantation. DRB1 is expressed at a level five times higher than its paralogs DRB3, DRB4 and DRB5. DRB1 is present in all individuals. Allelic variants of DRB1 are linked with either none or one of the genes DRB3, DRB4 and DRB5. There are 4 related pseudogenes: DRB2, DRB6, DRB7, DRB8 and DRB9. ENSG00000196126 major histocompatibility complex, class II, DR beta 1 3123 HLA-DRB1 NA
NA ENSG00000196126 HLA class II histocompatibility antigen, DRB1-7 beta chain 105369230 LOC105369230 NA
This gene likely encodes a member of the carboxypeptidase family of proteins. Cloning of a comparable locus in mouse indicates that the encoded protein contains a discoidin domain and a carboxypeptidase domain, but the protein appears to lack residues necessary for carboxypeptidase activity. ENSG00000088882 carboxypeptidase X (M14 family), member 1 56265 CPXM1 NA
NA ENSG00000256928 NA ENSG00000256928 RP11-809N8.2 NA
Protein disulfide isomerases (EC 5.3.4.1), such as PDIP, are endoplasmic reticulum (ER) resident proteins that catalyze protein folding and thiol-disulfide interchange reactions (Desilva et al., 1996 [PubMed 8561901]). ENSG00000185615 protein disulfide isomerase family A member 2 64714 PDIA2 NA
NA ENSG00000163040 coiled-coil domain containing 74A 90557 CCDC74A NA
The WNT gene family consists of structurally related genes which encode secreted signaling proteins. These proteins have been implicated in oncogenesis and in several developmental processes, including regulation of cell fate and patterning during embryogenesis. This gene is a member of the WNT gene family. It encodes a protein which shows 97%, 85%, and 63% amino acid identity with mouse, chicken, and Xenopus Wnt11 protein, respectively. This gene may play roles in the development of skeleton, kidney and lung, and is considered to be a plausible candidate gene for High Bone Mass Syndrome. ENSG00000085741 Wnt family member 11 7481 WNT11 NA
This gene encodes a member of the regulators of G protein signaling (RGS) family. The RGS proteins are signal transduction molecules which are involved in the regulation of heterotrimeric G proteins by acting as GTPase activators. This gene is a hypoxia-inducible factor-1 dependent, hypoxia-induced gene which is involved in the induction of endothelial apoptosis. This gene is also one of three genes on chromosome 1q contributing to elevated blood pressure. Alternatively spliced transcript variants have been identified. ENSG00000143248 regulator of G-protein signaling 5 8490 RGS5 NA
NA ENSG00000270172 NA NA NA TRUE
Defensins are a family of antimicrobial and cytotoxic peptides thought to be involved in host defense. They are abundant in the granules of neutrophils and also found in the epithelia of mucosal surfaces such as those of the intestine, respiratory tract, urinary tract, and vagina. Members of the defensin family are highly similar in protein sequence and distinguished by a conserved cysteine motif. Several of the alpha defensin genes appear to be clustered on chromosome 8. The protein encoded by this gene, defensin, alpha 5, is highly expressed in the secretory granules of Paneth cells of the ileum. ENSG00000164816 defensin alpha 5 1670 DEFA5 NA
This gene encodes neutrophil cytosolic factor 2, the 67-kilodalton cytosolic subunit of the multi-protein NADPH oxidase complex found in neutrophils. This oxidase produces a burst of superoxide which is delivered to the lumen of the neutrophil phagosome. Mutations in this gene, as well as in other NADPH oxidase subunits, can result in chronic granulomatous disease, a disease that causes recurrent infections by catalase-positive organisms. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000116701 neutrophil cytosolic factor 2 4688 NCF2 NA
LY6G6C belongs to a cluster of leukocyte antigen-6 (LY6) genes located in the major histocompatibility complex (MHC) class III region on chromosome 6. Members of the LY6 superfamily typically contain 70 to 80 amino acids, including 8 to 10 cysteines. Most LY6 proteins are attached to the cell surface by a glycosylphosphatidylinositol (GPI) anchor that is directly involved in signal transduction (Mallya et al., 2002 [PubMed 12079290]). ENSG00000204421 lymphocyte antigen 6 complex, locus G6C 80740 LY6G6C NA
NA ENSG00000249007 NA ENSG00000249007 RP11-510N19.5 NA
Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. ENSG00000219073 chymotrypsin like elastase family member 3B 23436 CELA3B NA
TMEM97 is a conserved integral membrane protein that plays a role in controlling cellular cholesterol levels (Bartz et al., 2009 [PubMed 19583955]). ENSG00000109084 transmembrane protein 97 27346 TMEM97 NA
This gene encodes a type I transmembrane protein that is localized to junctional complexes between endothelial and epithelial cells and may have a role in cell-cell adhesion. Expression of this gene in white adipose tissue is implicated in adipocyte maturation and development of obesity. This gene is also essential for normal intestinal development and mutations in the gene are associated with congenital short bowel syndrome. ENSG00000166250 CXADR-like membrane protein 79827 CLMP NA
NA ENSG00000236047 NA ENSG00000236047 AC073410.1 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",6,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 7 Annotations

out <- mygene::queryMany(gene_list[7,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
query X_id name summary symbol notfound
ENSG00000171476 84525 HOP homeobox The protein encoded by this gene is a homeodomain protein that lacks certain conserved residues required for DNA binding. It was reported that choriocarcinoma cell lines and tissues failed to express this gene, which suggested the possible involvement of this gene in malignant conversion of placental trophoblasts. Studies in mice suggest that this protein may interact with serum response factor (SRF) and modulate SRF-dependent cardiac-specific gene expression and cardiac development. Multiple alternatively spliced transcript variants have been identified for this gene. HOPX NA
ENSG00000079215 6507 solute carrier family 1 member 3 This gene encodes a member of a member of a high affinity glutamate transporter family. This gene functions in the termination of excitatory neurotransmission in central nervous system. Mutations are associated with episodic ataxia, Type 6. Alternative splicing results in multiple transcript variants. SLC1A3 NA
ENSG00000234964 ENSG00000234964 fatty acid binding protein 5 pseudogene 7 NA FABP5P7 NA
ENSG00000117115 11240 peptidyl arginine deiminase 2 This gene encodes a member of the peptidyl arginine deiminase family of enzymes, which catalyze the post-translational deimination of proteins by converting arginine residues into citrullines in the presence of calcium ions. The family members have distinct substrate specificities and tissue-specific expression patterns. The type II enzyme is the most widely expressed family member. Known substrates for this enzyme include myelin basic protein in the central nervous system and vimentin in skeletal muscle and macrophages. This enzyme is thought to play a role in the onset and progression of neurodegenerative human disorders, including Alzheimer disease and multiple sclerosis, and it has also been implicated in glaucoma pathogenesis. This gene exists in a cluster with four other paralogous genes. PADI2 NA
ENSG00000161682 284069 family with sequence similarity 171 member A2 NA FAM171A2 NA
ENSG00000228314 54055 cytochrome P450 family 4 subfamily F member 29, pseudogene NA CYP4F29P NA
ENSG00000185615 64714 protein disulfide isomerase family A member 2 Protein disulfide isomerases (EC 5.3.4.1), such as PDIP, are endoplasmic reticulum (ER) resident proteins that catalyze protein folding and thiol-disulfide interchange reactions (Desilva et al., 1996 [PubMed 8561901]). PDIA2 NA
ENSG00000108405 5023 purinergic receptor P2X 1 The protein encoded by this gene belongs to the P2X family of G-protein-coupled receptors. These proteins can form homo-and heterotimers and function as ATP-gated ion channels and mediate rapid and selective permeability to cations. This protein is primarily localized to smooth muscle where binds ATP and mediates synaptic transmission between neurons and from neurons to smooth muscle and may being responsible for sympathetic vasoconstriction in small arteries, arterioles and vas deferens. Mouse studies suggest that this receptor is essential for normal male reproductive function. This protein may also be involved in promoting apoptosis. P2RX1 NA
ENSG00000172023 5968 regenerating family member 1 beta This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. REG1B NA
ENSG00000110852 9976 C-type lectin domain family 2 member B This gene encodes a member of the C-type lectin/C-type lectin-like domain (CTL/CTLD) superfamily. Members of this family share a common protein fold and have diverse functions, such as cell adhesion, cell-cell signalling, glycoprotein turnover, and roles in inflammation and immune response. The encoded type 2 transmembrane protein may function as a cell activation antigen. An alternative splice variant has been described but its full-length sequence has not been determined. This gene is closely linked to other CTL/CTLD superfamily members on chromosome 12p13 in the natural killer gene complex region. CLEC2B NA
ENSG00000115386 5967 regenerating family member 1 alpha This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. REG1A NA
ENSG00000171444 4163 mutated in colorectal cancers This gene is a candidate colorectal tumor suppressor gene that is thought to negatively regulate cell cycle progression. The orthologous gene in the mouse expresses a phosphoprotein associated with the plasma membrane and membrane organelles, and overexpression of the mouse protein inhibits entry into S phase. Multiple transcript variants encoding different isoforms have been found for this gene. MCC NA
ENSG00000172016 5068 regenerating family member 3 alpha This gene encodes a pancreatic secretory protein that may be involved in cell proliferation or differentiation. It has similarity to the C-type lectin superfamily. The enhanced expression of this gene is observed during pancreatic inflammation and liver carcinogenesis. The mature protein also functions as an antimicrobial protein with antibacterial activity. Alternate splicing results in multiple transcript variants that encode the same protein. REG3A NA
ENSG00000204983 5644 protease, serine 1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. PRSS1 NA
ENSG00000088882 56265 carboxypeptidase X (M14 family), member 1 This gene likely encodes a member of the carboxypeptidase family of proteins. Cloning of a comparable locus in mouse indicates that the encoded protein contains a discoidin domain and a carboxypeptidase domain, but the protein appears to lack residues necessary for carboxypeptidase activity. CPXM1 NA
ENSG00000079385 634 carcinoembryonic antigen related cell adhesion molecule 1 This gene encodes a member of the carcinoembryonic antigen (CEA) gene family, which belongs to the immunoglobulin superfamily. Two subgroups of the CEA family, the CEA cell adhesion molecules and the pregnancy-specific glycoproteins, are located within a 1.2 Mb cluster on the long arm of chromosome 19. Eleven pseudogenes of the CEA cell adhesion molecule subgroup are also found in the cluster. The encoded protein was originally described in bile ducts of liver as biliary glycoprotein. Subsequently, it was found to be a cell-cell adhesion molecule detected on leukocytes, epithelia, and endothelia. The encoded protein mediates cell adhesion via homophilic as well as heterophilic binding to other proteins of the subgroup. Multiple cellular activities have been attributed to the encoded protein, including roles in the differentiation and arrangement of tissue three-dimensional structure, angiogenesis, apoptosis, tumor suppression, metastasis, and the modulation of innate and adaptive immune responses. Multiple transcript variants encoding different isoforms have been reported, but the full-length nature of all variants has not been defined. CEACAM1 NA
ENSG00000137857 53905 dual oxidase 1 The protein encoded by this gene is a glycoprotein and a member of the NADPH oxidase family. The synthesis of thyroid hormone is catalyzed by a protein complex located at the apical membrane of thyroid follicular cells. This complex contains an iodide transporter, thyroperoxidase, and a peroxide generating system that includes proteins encoded by this gene and the similar DUOX2 gene. This protein is known as dual oxidase because it has both a peroxidase homology domain and a gp91phox domain. This protein generates hydrogen peroxide and thereby plays a role in the activity of thyroid peroxidase, lactoperoxidase, and in lactoperoxidase-mediated antimicrobial defense at mucosal surfaces. Two alternatively spliced transcript variants encoding the same protein have been described for this gene. DUOX1 NA
ENSG00000219073 23436 chymotrypsin like elastase family member 3B Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. CELA3B NA
ENSG00000140459 1583 cytochrome P450 family 11 subfamily A member 1 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the mitochondrial inner membrane and catalyzes the conversion of cholesterol to pregnenolone, the first and rate-limiting step in the synthesis of the steroid hormones. Two transcript variants encoding different isoforms have been found for this gene. The cellular location of the smaller isoform is unclear since it lacks the mitochondrial-targeting transit peptide. CYP11A1 NA
ENSG00000162482 22977 aldo-keto reductase family 7 member A3 Aldo-keto reductases, such as AKR7A3, are involved in the detoxification of aldehydes and ketones. AKR7A3 NA
ENSG00000143320 1382 cellular retinoic acid binding protein 2 This gene encodes a member of the retinoic acid (RA, a form of vitamin A) binding protein family and lipocalin/cytosolic fatty-acid binding protein family. The protein is a cytosol-to-nuclear shuttling protein, which facilitates RA binding to its cognate receptor complex and transfer to the nucleus. It is involved in the retinoid signaling pathway, and is associated with increased circulating low-density lipoprotein cholesterol. Alternatively spliced transcript variants encoding the same protein have been found for this gene. CRABP2 NA
ENSG00000272275 ENSG00000272275 NA NA RP11-791G15.2 NA
ENSG00000268995 ENSG00000268995 vomeronasal 1 receptor 82 pseudogene NA VN1R82P NA
ENSG00000139540 283375 solute carrier family 39 member 5 The protein encoded by this gene belongs to the ZIP family of zinc transporters that transport zinc into cells from outside, and play a crucial role in controlling intracellular zinc levels. Zinc is an essential cofactor for many enzymes and proteins involved in gene transcription, growth, development and differentiation. Mutations in this gene have been associated with autosomal dominant high myopia (MYP24). Alternatively spliced transcript variants have been found for this gene. SLC39A5 NA
ENSG00000169347 2813 glycoprotein 2 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. GP2 NA
ENSG00000128594 64101 leucine rich repeat containing 4 This gene is significantly downregulated in primary brain tumors. The exact function of the protein encoded by this gene is unknown. LRRC4 NA
ENSG00000082397 23136 erythrocyte membrane protein band 4.1 like 3 NA EPB41L3 NA
ENSG00000118271 7276 transthyretin This gene encodes transthyretin, one of the three prealbumins including alpha-1-antitrypsin, transthyretin and orosomucoid. Transthyretin is a carrier protein; it transports thyroid hormones in the plasma and cerebrospinal fluid, and also transports retinol (vitamin A) in the plasma. The protein consists of a tetramer of identical subunits. More than 80 different mutations in this gene have been reported; most mutations are related to amyloid deposition, affecting predominantly peripheral nerve and/or the heart, and a small portion of the gene mutations is non-amyloidogenic. The diseases caused by mutations include amyloidotic polyneuropathy, euthyroid hyperthyroxinaemia, amyloidotic vitreous opacities, cardiomyopathy, oculoleptomeningeal amyloidosis, meningocerebrovascular amyloidosis, carpal tunnel syndrome, etc. TTR NA
ENSG00000162551 249 alkaline phosphatase, liver/bone/kidney This gene encodes a member of the alkaline phosphatase family of proteins. There are at least four distinct but related alkaline phosphatases: intestinal, placental, placental-like, and liver/bone/kidney (tissue non-specific). The first three are located together on chromosome 2, while the tissue non-specific form is located on chromosome 1. The product of this gene is a membrane bound glycosylated enzyme that is not expressed in any particular tissue and is, therefore, referred to as the tissue-nonspecific form of the enzyme. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature enzyme. This enzyme may play a role in bone mineralization. Mutations in this gene have been linked to hypophosphatasia, a disorder that is characterized by hypercalcemia and skeletal defects. ALPL NA
ENSG00000189184 54510 protocadherin 18 This gene belongs to the protocadherin gene family, a subfamily of the cadherin superfamily. This gene encodes a protein which contains 6 extracellular cadherin domains, a transmembrane domain and a cytoplasmic tail differing from those of the classical cadherins. Although its specific function is undetermined, the cadherin-related neuronal receptor is thought to play a role in the establishment and function of specific cell-cell connections in the brain. PCDH18 NA
ENSG00000129467 196883 adenylate cyclase 4 This gene encodes a member of the family of adenylate cyclases, which are membrane-associated enzymes that catalyze the formation of the secondary messenger cyclic adenosine monophosphate (cAMP). Mouse studies show that adenylate cyclase 4, along with adenylate cyclases 2 and 3, is expressed in olfactory cilia, suggesting that several different adenylate cyclases may couple to olfactory receptors and that there may be multiple receptor-mediated mechanisms for the generation of cAMP signals. Alternative splicing results in transcript variants. ADCY4 NA
ENSG00000142789 10136 chymotrypsin like elastase family member 3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. CELA3A NA
ENSG00000089041 5027 purinergic receptor P2X 7 The product of this gene belongs to the family of purinoceptors for ATP. This receptor functions as a ligand-gated ion channel and is responsible for ATP-dependent lysis of macrophages through the formation of membrane pores permeable to large molecules. Activation of this nuclear receptor by ATP in the cytoplasm may be a mechanism by which cellular activity can be coupled to changes in gene expression. Multiple alternatively spliced variants have been identified, most of which fit nonsense-mediated decay (NMD) criteria. P2RX7 NA
ENSG00000171345 3880 keratin 19 The protein encoded by this gene is a member of the keratin family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. The type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains. Unlike its related family members, this smallest known acidic cytokeratin is not paired with a basic cytokeratin in epithelial cells. It is specifically expressed in the periderm, the transiently superficial layer that envelopes the developing epidermis. The type I cytokeratins are clustered in a region of chromosome 17q12-q21. KRT19 NA
ENSG00000110245 345 apolipoprotein C3 Apolipoprotein C-III is a very low density lipoprotein (VLDL) protein. APOC3 inhibits lipoprotein lipase and hepatic lipase; it is thought to delay catabolism of triglyceride-rich particles. The APOA1, APOC3 and APOA4 genes are closely linked in both rat and human genomes. The A-I and A-IV genes are transcribed from the same strand, while the A-1 and C-III genes are convergently transcribed. An increase in apoC-III levels induces the development of hypertriglyceridemia. APOC3 NA
ENSG00000157107 115548 FCH domain only 2 NA FCHO2 NA
ENSG00000121680 9409 peroxisomal biogenesis factor 16 The protein encoded by this gene is an integral peroxisomal membrane protein. An inactivating nonsense mutation localized to this gene was observed in a patient with Zellweger syndrome of the complementation group CGD/CG9. Expression of this gene product morphologically and biochemically restores the formation of new peroxisomes, suggesting a role in peroxisome organization and biogenesis. Alternative splicing has been observed for this gene and two variants have been described. PEX16 NA
ENSG00000185559 8788 delta like non-canonical Notch ligand 1 This gene encodes a transmembrane protein that contains multiple epidermal growth factor repeats that functions as a regulator of cell growth. The encoded protein is involved in the differentiation of several cell types including adipocytes. This gene is located in a region of chromosome 14 frequently showing unparental disomy, and is imprinted and expressed from the paternal allele. A single nucleotide variant in this gene is associated with child and adolescent obesity and shows polar overdominance, where heterozygotes carrying an active paternal allele express the phenotype, while mutant homozygotes are normal. DLK1 NA
ENSG00000132329 10267 receptor activity modifying protein 1 The protein encoded by this gene is a member of the RAMP family of single-transmembrane-domain proteins, called receptor (calcitonin) activity modifying proteins (RAMPs). RAMPs are type I transmembrane proteins with an extracellular N terminus and a cytoplasmic C terminus. RAMPs are required to transport calcitonin-receptor-like receptor (CRLR) to the plasma membrane. CRLR, a receptor with seven transmembrane domains, can function as either a calcitonin-gene-related peptide (CGRP) receptor or an adrenomedullin receptor, depending on which members of the RAMP family are expressed. In the presence of this (RAMP1) protein, CRLR functions as a CGRP receptor. The RAMP1 protein is involved in the terminal glycosylation, maturation, and presentation of the CGRP receptor to the cell surface. Alternative splicing results in multiple transcript variants encoding different isoforms. RAMP1 NA
ENSG00000259185 ENSG00000259185 NA NA RP11-56B16.4 NA
ENSG00000171033 5569 protein kinase (cAMP-dependent, catalytic) inhibitor alpha The protein encoded by this gene is a member of the cAMP-dependent protein kinase (PKA) inhibitor family. This protein was demonstrated to interact with and inhibit the activities of both C alpha and C beta catalytic subunits of the PKA. Alternatively spliced transcript variants encoding the same protein have been reported. PKIA NA
ENSG00000254198 ENSG00000254198 NA NA RP11-598P20.3 NA
ENSG00000196263 57573 zinc finger protein 471 NA ZNF471 NA
ENSG00000063127 28968 solute carrier family 6 member 16 SLC6A16 shows structural characteristics of an Na(+)- and Cl(-)-dependent neurotransmitter transporter, including 12 transmembrane (TM) domains, intracellular N and C termini, and large extracellular loops containing multiple N-glycosylation sites. SLC6A16 NA
ENSG00000162882 23498 3-hydroxyanthranilate 3,4-dioxygenase 3-Hydroxyanthranilate 3,4-dioxygenase is a monomeric cytosolic protein belonging to the family of intramolecular dioxygenases containing nonheme ferrous iron. It is widely distributed in peripheral organs, such as liver and kidney, and is also present in low amounts in the central nervous system. HAAO catalyzes the synthesis of quinolinic acid (QUIN) from 3-hydroxyanthranilic acid. QUIN is an excitotoxin whose toxicity is mediated by its ability to activate glutamate N-methyl-D-aspartate receptors. Increased cerebral levels of QUIN may participate in the pathogenesis of neurologic and inflammatory disorders. HAAO has been suggested to play a role in disorders associated with altered tissue levels of QUIN. HAAO NA
ENSG00000175445 4023 lipoprotein lipase LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. LPL NA
ENSG00000117477 57821 coiled-coil domain containing 181 NA CCDC181 NA
ENSG00000104723 7991 tumor suppressor candidate 3 This gene is a candidate tumor suppressor gene. It is located within a homozygously deleted region of a metastatic prostate cancer. The gene is expressed in most nonlymphoid human tissues including prostate, lung, liver, and colon. Expression was also detected in many epithelial tumor cell lines. Two transcript variants encoding distinct isoforms have been identified for this gene. TUSC3 NA
ENSG00000118898 5493 periplakin The protein encoded by this gene is a component of desmosomes and of the epidermal cornified envelope in keratinocytes. The N-terminal domain of this protein interacts with the plasma membrane and its C-terminus interacts with intermediate filaments. Through its rod domain, this protein forms complexes with envoplakin. This protein may serve as a link between the cornified envelope and desmosomes as well as intermediate filaments. AKT1/PKB, a protein kinase mediating a variety of cell growth and survival signaling processes, is reported to interact with this protein, suggesting a possible role for this protein as a localization signal in AKT1-mediated signaling. PPL NA
ENSG00000175535 5406 pancreatic lipase This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. PNLIP NA
ENSG00000165029 19 ATP binding cassette subfamily A member 1 The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intracellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the ABC1 subfamily. Members of the ABC1 subfamily comprise the only major ABC subfamily found exclusively in multicellular eukaryotes. With cholesterol as its substrate, this protein functions as a cholesteral efflux pump in the cellular lipid removal pathway. Mutations in this gene have been associated with Tangier’s disease and familial high-density lipoprotein deficiency. ABCA1 NA
ENSG00000232573 ENSG00000232573 ribosomal protein L3 pseudogene 4 NA RPL3P4 NA
ENSG00000149806 2197 Finkel-Biskis-Reilly murine sarcoma virus (FBR-MuSV) ubiquitously expressed This gene is the cellular homolog of the fox sequence in the Finkel-Biskis-Reilly murine sarcoma virus (FBR-MuSV). It encodes a fusion protein consisting of the ubiquitin-like protein fubi at the N terminus and ribosomal protein S30 at the C terminus. It has been proposed that the fusion protein is post-translationally processed to generate free fubi and free ribosomal protein S30. Fubi is a member of the ubiquitin family, and ribosomal protein S30 belongs to the S30E family of ribosomal proteins. Whereas the function of fubi is currently unknown, ribosomal protein S30 is a component of the 40S subunit of the cytoplasmic ribosome and displays antimicrobial activity. Pseudogenes derived from this gene are present in the genome. Similar to ribosomal protein S30, ribosomal proteins S27a and L40 are synthesized as fusion proteins with ubiquitin. FAU NA
ENSG00000132622 116835 heat shock protein family A (Hsp70) member 12B The protein encoded by this gene contains an atypical heat shock protein 70 (Hsp70) ATPase domain and is therefore a distant member of the mammalian Hsp70 family. This gene may be involved in susceptibility to atherosclerosis. Alternative splicing results in multiple transcript variants encoding different isoforms. HSPA12B NA
ENSG00000258177 ENSG00000258177 NA NA RP11-394J1.2 NA
ENSG00000139926 122786 FERM domain containing 6 NA FRMD6 NA
ENSG00000105427 84518 cornifelin NA CNFN NA
ENSG00000184990 10572 SIVA1 apoptosis inducing factor This gene encodes a protein with an important role in the apoptotic (programmed cell death) pathway induced by the CD27 antigen, a member of the tumor necrosis factor receptor (TFNR) superfamily. The CD27 antigen cytoplasmic tail binds to the N-terminus of this protein. Two alternatively spliced transcript variants encoding distinct proteins have been described. SIVA1 NA
ENSG00000169245 3627 C-X-C motif chemokine ligand 10 This antimicrobial gene encodes a chemokine of the CXC subfamily and ligand for the receptor CXCR3. Binding of this protein to CXCR3 results in pleiotropic effects, including stimulation of monocytes, natural killer and T-cell migration, and modulation of adhesion molecule expression. CXCL10 NA
ENSG00000006451 5898 RALA Ras like proto-oncogene A The product of this gene belongs to the small GTPase superfamily, Ras family of proteins. GTP-binding proteins mediate the transmembrane signaling initiated by the occupancy of certain cell surface receptors. This gene encodes a low molecular mass ras-like GTP-binding protein that shares about 50% similarity with other ras proteins. RALA NA
ENSG00000251442 ENSG00000251442 long intergenic non-protein coding RNA 1094 NA LINC01094 NA
ENSG00000104177 50804 myelin expression factor 2 NA MYEF2 NA
ENSG00000145824 9547 C-X-C motif chemokine ligand 14 This antimicrobial gene belongs to the cytokine gene family which encode secreted proteins involved in immunoregulatory and inflammatory processes. The protein encoded by this gene is structurally related to the CXC (Cys-X-Cys) subfamily of cytokines. Members of this subfamily are characterized by two cysteines separated by a single amino acid. This cytokine displays chemotactic activity for monocytes but not for lymphocytes, dendritic cells, neutrophils or macrophages. It has been implicated that this cytokine is involved in the homeostasis of monocyte-derived macrophages rather than in inflammation. CXCL14 NA
ENSG00000169860 5028 purinergic receptor P2Y1 The product of this gene belongs to the family of G-protein coupled receptors. This family has several receptor subtypes with different pharmacological selectivity, which overlaps in some cases, for various adenosine and uridine nucleotides. This receptor functions as a receptor for extracellular ATP and ADP. In platelets binding to ADP leads to mobilization of intracellular calcium ions via activation of phospholipase C, a change in platelet shape, and probably to platelet aggregation. P2RY1 NA
ENSG00000108950 54757 family with sequence similarity 20 member A This locus encodes a protein that is likely secreted and may function in hematopoiesis. A mutation at this locus has been associated with amelogenesis imperfecta and gingival hyperplasia syndrome. Alternatively spliced transcript variants have been identified. FAM20A NA
ENSG00000158270 81035 collectin subfamily member 12 This gene encodes a member of the C-lectin family, proteins that possess collagen-like sequences and carbohydrate recognition domains. This protein is a scavenger receptor, a cell surface glycoprotein that displays several functions associated with host defense. It can bind to carbohydrate antigens on microorganisms, facilitating their recognition and removal. It also mediates the recognition, internalization, and degradation of oxidatively modified low density lipoprotein by vascular endothelial cells. COLEC12 NA
ENSG00000114854 7134 troponin C1, slow skeletal and cardiac type Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z. TNNC1 NA
ENSG00000214050 157574 F-box protein 16 This gene encodes a member of the F-box protein family, members of which are characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into three classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbx class. Multiple transcript variants encoding different isoforms have been found for this gene. FBXO16 NA
ENSG00000132688 10763 nestin This gene encodes a member of the intermediate filament protein family and is expressed primarily in nerve cells. NES NA
ENSG00000075275 9620 cadherin EGF LAG seven-pass G-type receptor 1 The protein encoded by this gene is a member of the flamingo subfamily, part of the cadherin superfamily. The flamingo subfamily consists of nonclassic-type cadherins; a subpopulation that does not interact with catenins. The flamingo cadherins are located at the plasma membrane and have nine cadherin domains, seven epidermal growth factor-like repeats and two laminin A G-type repeats in their ectodomain. They also have seven transmembrane domains, a characteristic unique to this subfamily. It is postulated that these proteins are receptors involved in contact-mediated communication, with cadherin domains acting as homophilic binding regions and the EGF-like domains involved in cell adhesion and receptor-ligand interactions. This particular member is a developmentally regulated, neural-specific gene which plays an unspecified role in early embryogenesis. CELSR1 NA
ENSG00000116106 2043 EPH receptor A4 This gene belongs to the ephrin receptor subfamily of the protein-tyrosine kinase family. EPH and EPH-related receptors have been implicated in mediating developmental events, particularly in the nervous system. Receptors in the EPH subfamily typically have a single kinase domain and an extracellular region containing a Cys-rich domain and 2 fibronectin type III repeats. The ephrin receptors are divided into 2 groups based on the similarity of their extracellular domain sequences and their affinities for binding ephrin-A and ephrin-B ligands. Multiple transcript variants encoding different isoforms have been found for this gene. EPHA4 NA
ENSG00000114019 51421 angiomotin like 2 Angiomotin is a protein that binds angiostatin, a circulating inhibitor of the formation of new blood vessels (angiogenesis). Angiomotin mediates angiostatin inhibition of endothelial cell migration and tube formation in vitro. The protein encoded by this gene is related to angiomotin and is a member of the motin protein family. Alternative splicing results in multiple transcript variants of this gene. AMOTL2 NA
ENSG00000114770 10057 ATP binding cassette subfamily C member 5 The protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the MRP subfamily which is involved in multi-drug resistance. This protein functions in the cellular export of its substrate, cyclic nucleotides. This export contributes to the degradation of phosphodiesterases and possibly an elimination pathway for cyclic nucleotides. Studies show that this protein provides resistance to thiopurine anticancer drugs, 6-mercatopurine and thioguanine, and the anti-HIV drug 9-(2-phosphonylmethoxyethyl)adenine. This protein may be involved in resistance to thiopurines in acute lymphoblastic leukemia and antiretroviral nucleoside analogs in HIV-infected patients. Alternative splicing results in multiple transcript variants. ABCC5 NA
ENSG00000163141 149428 BCL2/adenovirus E1B 19kD interacting protein like The protein encoded by this gene interacts with several other proteins, such as BCL2, ARHGAP1, MIF and GFER. It may function as a bridge molecule between BCL2 and ARHGAP1/CDC42 in promoting cell death. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. BNIPL NA
ENSG00000088002 6820 sulfotransferase family 2B member 1 Sulfotransferase enzymes catalyze the sulfate conjugation of many hormones, neurotransmitters, drugs, and xenobiotic compounds. These cytosolic enzymes are different in their tissue distributions and substrate specificities. The gene structure (number and length of exons) is similar among family members. This gene sulfates dehydroepiandrosterone but not 4-nitrophenol, a typical substrate for the phenol and estrogen sulfotransferase subfamilies. Two alternatively spliced variants that encode different isoforms have been described. SULT2B1 NA
ENSG00000250404 NA NA NA NA TRUE
ENSG00000182333 8513 lipase F, gastric type This gene encodes gastric lipase, an enzyme involved in the digestion of dietary triglycerides in the gastrointestinal tract, and responsible for 30% of fat digestion processes occurring in human. It is secreted by gastric chief cells in the fundic mucosa of the stomach, and it hydrolyzes the ester bonds of triglycerides under acidic pH conditions. The gene is a member of a conserved gene family of lipases that play distinct roles in neutral lipid metabolism. Several transcript variants encoding different isoforms have been found for this gene. LIPF NA
ENSG00000166923 26585 gremlin 1, DAN family BMP antagonist This gene encodes a member of the BMP (bone morphogenic protein) antagonist family. Like BMPs, BMP antagonists contain cystine knots and typically form homo- and heterodimers. The CAN (cerberus and dan) subfamily of BMP antagonists, to which this gene belongs, is characterized by a C-terminal cystine knot with an eight-membered ring. The antagonistic effect of the secreted glycosylated protein encoded by this gene is likely due to its direct binding to BMP proteins. As an antagonist of BMP, this gene may play a role in regulating organogenesis, body patterning, and tissue differentiation. In mouse, this protein has been shown to relay the sonic hedgehog (SHH) signal from the polarizing region to the apical ectodermal ridge during limb bud outgrowth. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. GREM1 NA
ENSG00000176834 54621 V-set and immunoglobulin domain containing 10 NA VSIG10 NA
ENSG00000229874 NA NA NA NA TRUE
ENSG00000257831 ENSG00000257831 NA NA RP11-596D21.1 NA
ENSG00000112964 2690 growth hormone receptor This gene encodes a member of the type I cytokine receptor family, which is a transmembrane receptor for growth hormone. Binding of growth hormone to the receptor leads to receptor dimerization and the activation of an intra- and intercellular signal transduction pathway leading to growth. Mutations in this gene have been associated with Laron syndrome, also known as the growth hormone insensitivity syndrome (GHIS), a disorder characterized by short stature. In humans and rabbits, but not rodents, growth hormone binding protein (GHBP) is generated by proteolytic cleavage of the extracellular ligand-binding domain from the mature growth hormone receptor protein. Multiple alternatively spliced transcript variants have been found for this gene. GHR NA
ENSG00000250606 NA NA NA NA TRUE
ENSG00000128512 9732 dedicator of cytokinesis 4 This gene is a member of the dedicator of cytokinesis (DOCK) family and encodes a protein with a DHR-1 (CZH-1) domain, a DHR-2 (CZH-2) domain and an SH3 domain. This membrane-associated, cytoplasmic protein functions as a guanine nucleotide exchange factor and is involved in regulation of adherens junctions between cells. Mutations in this gene have been associated with ovarian, prostate, glioma, and colorectal cancers. Alternatively spliced variants which encode different protein isoforms have been described, but only one has been fully characterized. DOCK4 NA
ENSG00000133710 11005 serine peptidase inhibitor, Kazal type 5 This gene encodes a multidomain serine protease inhibitor that contains 15 potential inhibitory domains. The encoded preproprotein is proteolytically processed to generate multiple protein products, which may exhibit unique activities and specificities. These proteins may play a role in skin and hair morphogenesis, as well as anti-inflammatory and antimicrobial protection of mucous epithelia. Mutations in this gene may result in Netherton syndrome, a disorder characterized by ichthyosis, defective cornification, and atopy. This gene is present in a gene cluster on chromosome 5. Alternative splicing results in multiple transcript variants. SPINK5 NA
ENSG00000139832 55647 RAB20, member RAS oncogene family NA RAB20 NA
ENSG00000169554 9839 zinc finger E-box binding homeobox 2 The protein encoded by this gene is a member of the Zfh1 family of 2-handed zinc finger/homeodomain proteins. It is located in the nucleus and functions as a DNA-binding transcriptional repressor that interacts with activated SMADs. Mutations in this gene are associated with Hirschsprung disease/Mowat-Wilson syndrome. Alternatively spliced transcript variants have been found for this gene. ZEB2 NA
ENSG00000168484 6440 surfactant protein C This gene encodes the pulmonary-associated surfactant protein C (SPC), an extremely hydrophobic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 2, also called pulmonary alveolar proteinosis due to surfactant protein C deficiency, and are associated with interstitial lung disease in older infants, children, and adults. Alternatively spliced transcript variants encoding different protein isoforms have been identified. SFTPC NA
ENSG00000123009 ENSG00000123009 NME/NM23 nucleoside diphosphate kinase 2 pseudogene 1 NA NME2P1 NA
ENSG00000124785 51299 neuritin 1 This gene encodes a member of the neuritin family, and is expressed in postmitotic-differentiating neurons of the developmental nervous system and neuronal structures associated with plasticity in the adult. The expression of this gene can be induced by neural activity and neurotrophins. The encoded protein contains a consensus cleavage signal found in glycosylphoshatidylinositol (GPI)-anchored proteins. The encoded protein promotes neurite outgrowth and arborization, suggesting its role in promoting neuritogenesis. Overexpression of the encoded protein may be associated with astrocytoma progression. Alternative splicing results in multiple transcript variants. NRN1 NA
ENSG00000138161 50624 CUB and zona pellucida like domains 1 NA CUZD1 NA
ENSG00000197361 283807 F-box and leucine rich repeat protein 22 This gene encodes a member of the F-box protein family. This F-box protein interacts with S-phase kinase-associated protein 1A and cullin in order to form SCF complexes which function as ubiquitin ligases. FBXL22 NA
ENSG00000124225 56937 prostate transmembrane protein, androgen induced 1 This gene encodes a transmembrane protein that contains a Smad interacting motif (SIM). Expression of this gene is induced by androgens and transforming growth factor beta, and the encoded protein suppresses the androgen receptor and transforming growth factor beta signaling pathways though interactions with Smad proteins. Overexpression of this gene may play a role in multiple types of cancer. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. PMEPA1 NA
ENSG00000104731 105371397 uncharacterized LOC105371397 NA LOC105371397 NA
ENSG00000104731 54758 kelch domain containing 4 NA KLHDC4 NA
ENSG00000213963 100130691 uncharacterized LOC100130691 NA LOC100130691 NA
ENSG00000162390 26027 acyl-CoA thioesterase 11 This gene encodes a member of the acyl-CoA thioesterase family which catalyse the conversion of activated fatty acids to the corresponding non-esterified fatty acid and coenzyme A. Expression of a mouse homolog in brown adipose tissue is induced by low temperatures and repressed by warm temperatures. Higher levels of expression of the mouse homolog has been found in obesity-resistant mice compared with obesity-prone mice, suggesting a role of acyl-CoA thioesterase 11 in obesity. Alternative splicing results in transcript variants. ACOT11 NA
ENSG00000139567 94 activin A receptor like type 1 This gene encodes a type I cell-surface receptor for the TGF-beta superfamily of ligands. It shares with other type I receptors a high degree of similarity in serine-threonine kinase subdomains, a glycine- and serine-rich region (called the GS domain) preceding the kinase domain, and a short C-terminal tail. The encoded protein, sometimes termed ALK1, shares similar domain structures with other closely related ALK or activin receptor-like kinase proteins that form a subfamily of receptor serine/threonine kinases. Mutations in this gene are associated with hemorrhagic telangiectasia type 2, also known as Rendu-Osler-Weber syndrome 2. ACVRL1 NA
ENSG00000103044 3038 hyaluronan synthase 3 The protein encoded by this gene is involved in the synthesis of the unbranched glycosaminoglycan hyaluronan, or hyaluronic acid, which is a major constituent of the extracellular matrix. This gene is a member of the NODC/HAS gene family. Compared to the proteins encoded by other members of this gene family, this protein appears to be more of a regulator of hyaluronan synthesis. Alternative splicing results in multiple transcript variants. HAS3 NA
ENSG00000213280 ENSG00000213280 NA NA RP11-212P7.1 NA
ENSG00000232774 400221 uncharacterized LOC400221 NA FLJ22447 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",7,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 8 Annotations

out <- mygene::queryMany(gene_list[8,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id query name summary notfound
CHGB 1114 ENSG00000089199 chromogranin B This gene encodes a tyrosine-sulfated secretory protein abundant in peptidergic endocrine cells and neurons. This protein may serve as a precursor for regulatory peptides. NA
PRSS1 5644 ENSG00000204983 protease, serine 1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. NA
SLPI 6590 ENSG00000124107 secretory leukocyte peptidase inhibitor This gene encodes a secreted inhibitor which protects epithelial tissues from serine proteases. It is found in various secretions including seminal plasma, cervical mucus, and bronchial secretions, and has affinity for trypsin, leukocyte elastase, and cathepsin G. Its inhibitory effect contributes to the immune response by protecting epithelial surfaces from attack by endogenous proteolytic enzymes. This antimicrobial protein has antibacterial, antifungal and antiviral activity. NA
REG1A 5967 ENSG00000115386 regenerating family member 1 alpha This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. NA
CELA3A 10136 ENSG00000142789 chymotrypsin like elastase family member 3A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3A has little elastolytic activity. Like most of the human elastases, elastase 3A is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3A preferentially cleaves proteins after alanine residues. Elastase 3A may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1. NA
MYH6 4624 ENSG00000197616 myosin, heavy chain 6, cardiac muscle, alpha Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3. NA
BNIPL 149428 ENSG00000163141 BCL2/adenovirus E1B 19kD interacting protein like The protein encoded by this gene interacts with several other proteins, such as BCL2, ARHGAP1, MIF and GFER. It may function as a bridge molecule between BCL2 and ARHGAP1/CDC42 in promoting cell death. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. NA
TNFAIP8L3 388121 ENSG00000183578 TNF alpha induced protein 8 like 3 NA NA
PNLIP 5406 ENSG00000175535 pancreatic lipase This gene is a member of the lipase gene family. It encodes a carboxyl esterase that hydrolyzes insoluble, emulsified triglycerides, and is essential for the efficient digestion of dietary fats. This gene is expressed specifically in the pancreas. NA
CLPS 1208 ENSG00000137392 colipase The protein encoded by this gene is a cofactor needed by pancreatic lipase for efficient dietary lipid hydrolysis. It binds to the C-terminal, non-catalytic domain of lipase, thereby stabilizing an active conformation and considerably increasing the overall hydrophobic binding site. The gene product allows lipase to anchor noncovalently to the surface of lipid micelles, counteracting the destabilizing influence of intestinal bile salts. This cofactor is only expressed in pancreatic acinar cells, suggesting regulation of expression by tissue-specific elements. Three transcript variants encoding different isoforms have been found for this gene. NA
TRIM29 23650 ENSG00000137699 tripartite motif containing 29 The protein encoded by this gene belongs to the TRIM protein family. It has multiple zinc finger motifs and a leucine zipper motif. It has been proposed to form homo- or heterodimers which are involved in nucleic acid binding. Thus, it may act as a transcriptional regulatory factor involved in carcinogenesis and/or differentiation. It may also function in the suppression of radiosensitivity since it is associated with ataxia telangiectasia phenotype. NA
GP2 2813 ENSG00000169347 glycoprotein 2 This gene encodes an integral membrane protein that is secreted from intracellular zymogen granules and associates with the plasma membrane via glycosylphosphatidylinositol (GPI) linkage. The encoded protein binds pathogens such as enterobacteria, thereby playing an important role in the innate immune response. The C-terminus of this protein is related to the C-terminus of the protein encoded by the neighboring gene, uromodulin (UMOD). Alternative splicing results in multiple transcript variants. NA
CIDEC 63924 ENSG00000187288 cell death inducing DFFA like effector c This gene encodes a member of the cell death-inducing DNA fragmentation factor-like effector family. Members of this family play important roles in apoptosis. The encoded protein promotes lipid droplet formation in adipocytes and may mediate adipocyte apoptosis. This gene is regulated by insulin and its expression is positively correlated with insulin sensitivity. Mutations in this gene may contribute to insulin resistant diabetes. A pseudogene of this gene is located on the short arm of chromosome 3. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. NA
RP11-79P5.10 ENSG00000255883 ENSG00000255883 NA NA NA
SDC1 6382 ENSG00000115884 syndecan 1 The protein encoded by this gene is a transmembrane (type I) heparan sulfate proteoglycan and is a member of the syndecan proteoglycan family. The syndecans mediate cell binding, cell signaling, and cytoskeletal organization and syndecan receptors are required for internalization of the HIV-1 tat protein. The syndecan-1 protein functions as an integral membrane protein and participates in cell proliferation, cell migration and cell-matrix interactions via its receptor for extracellular matrix proteins. Altered syndecan-1 expression has been detected in several different tumor types. While several transcript variants may exist for this gene, the full-length natures of only two have been described to date. These two represent the major variants of this gene and encode the same protein. NA
S100A2 6273 ENSG00000196754 S100 calcium binding protein A2 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may have a tumor suppressor function. Chromosomal rearrangements and altered expression of this gene have been implicated in breast cancer. NA
REG1B 5968 ENSG00000172023 regenerating family member 1 beta This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. NA
CELA2A 63036 ENSG00000142615 chymotrypsin like elastase family member 2A Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Like most of the human elastases, elastase 2A is secreted from the pancreas as a zymogen. In other species, elastase 2A has been shown to preferentially cleave proteins after leucine, methionine, and phenylalanine residues. NA
REG3A 5068 ENSG00000172016 regenerating family member 3 alpha This gene encodes a pancreatic secretory protein that may be involved in cell proliferation or differentiation. It has similarity to the C-type lectin superfamily. The enhanced expression of this gene is observed during pancreatic inflammation and liver carcinogenesis. The mature protein also functions as an antimicrobial protein with antibacterial activity. Alternate splicing results in multiple transcript variants that encode the same protein. NA
CCDC3 83643 ENSG00000151468 coiled-coil domain containing 3 NA NA
LPL 4023 ENSG00000175445 lipoprotein lipase LPL encodes lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL functions as a homodimer, and has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptor-mediated lipoprotein uptake. Severe mutations that cause LPL deficiency result in type I hyperlipoproteinemia, while less extreme mutations in LPL are linked to many disorders of lipoprotein metabolism. NA
TNNI3 7137 ENSG00000129991 troponin I3, cardiac type Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. This gene encodes the TnI-cardiac protein and is exclusively expressed in cardiac muscle tissues. Mutations in this gene cause familial hypertrophic cardiomyopathy type 7 (CMH7) and familial restrictive cardiomyopathy (RCM). NA
PLA2G1B 5319 ENSG00000170890 phospholipase A2 group IB This gene encodes a secreted member of the phospholipase A2 (PLA2) class of enzymes, which is produced by the pancreatic acinar cells. The encoded calcium-dependent enzyme catalyzes the hydrolysis of the sn-2 position of membrane glycerophospholipids to release arachidonic acid (AA) and lysophospholipids. AA is subsequently converted by downstream metabolic enzymes to several bioactive lipophilic compounds (eicosanoids), including prostaglandins (PGs) and leukotrienes (LTs). The enzyme may be involved in several physiological processes including cell contraction, cell proliferation and pathological response. NA
PRODH 5625 ENSG00000100033 proline dehydrogenase 1 This gene encodes a mitochondrial protein that catalyzes the first step in proline degradation. Mutations in this gene are associated with hyperprolinemia type 1 and susceptibility to schizophrenia 4 (SCZD4). This gene is located on chromosome 22q11.21, a region which has also been associated with the contiguous gene deletion syndromes, DiGeorge and CATCH22. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
LIPG 9388 ENSG00000101670 lipase G, endothelial type The protein encoded by this gene has substantial phospholipase activity and may be involved in lipoprotein metabolism and vascular biology. This protein is designated a member of the TG lipase family by its sequence and characteristic lid region which provides substrate specificity for enzymes of the TG lipase family. NA
NA NA ENSG00000250606 NA NA TRUE
LDHB 3945 ENSG00000111716 lactate dehydrogenase B This gene encodes the B subunit of lactate dehydrogenase enzyme, which catalyzes the interconversion of pyruvate and lactate with concomitant interconversion of NADH and NAD+ in a post-glycolysis process. Alternatively spliced transcript variants have been found for this gene. Recent studies have shown that a C-terminally extended isoform is produced by use of an alternative in-frame translation termination codon via a stop codon readthrough mechanism, and that this isoform is localized in the peroxisomes. Mutations in this gene are associated with lactate dehydrogenase B deficiency. Pseudogenes have been identified on chromosomes X, 5 and 13. NA
CTRB2 440387 ENSG00000168928 chymotrypsinogen B2 NA NA
PLTP 5360 ENSG00000100979 phospholipid transfer protein The protein encoded by this gene is one of at least two lipid transfer proteins found in human plasma. The encoded protein transfers phospholipids from triglyceride-rich lipoproteins to high density lipoprotein (HDL). In addition to regulating the size of HDL particles, this protein may be involved in cholesterol metabolism. At least two transcript variants encoding different isoforms have been found for this gene. NA
TRIM63 84676 ENSG00000158022 tripartite motif containing 63 This gene encodes a member of the RING zinc finger protein family found in striated muscle and iris. The product of this gene is an E3 ubiquitin ligase that localizes to the Z-line and M-line lattices of myofibrils. This protein plays an important role in the atrophy of skeletal and cardiac muscle and is required for the degradation of myosin heavy chain proteins, myosin light chain, myosin binding protein, and for muscle-type creatine kinase. NA
CSRP3 8048 ENSG00000129170 cysteine and glycine rich protein 3 This gene encodes a member of the CSRP family of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this protein is found in a group of proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Mutations in this gene are thought to cause heritable forms of hypertrophic cardiomyopathy (HCM) and dilated cardiomyopathy (DCM) in humans. Alternatively spliced transcript variants with different 5’ UTR, but encoding the same protein, have been found for this gene. NA
LY6G6C 80740 ENSG00000204421 lymphocyte antigen 6 complex, locus G6C LY6G6C belongs to a cluster of leukocyte antigen-6 (LY6) genes located in the major histocompatibility complex (MHC) class III region on chromosome 6. Members of the LY6 superfamily typically contain 70 to 80 amino acids, including 8 to 10 cysteines. Most LY6 proteins are attached to the cell surface by a glycosylphosphatidylinositol (GPI) anchor that is directly involved in signal transduction (Mallya et al., 2002 [PubMed 12079290]). NA
IL34 146433 ENSG00000157368 interleukin 34 Interleukin-34 is a cytokine that promotes the differentiation and viability of monocytes and macrophages through the colony-stimulating factor-1 receptor (CSF1R; MIM 164770) (Lin et al., 2008 [PubMed 18467591]). NA
ISM1 140862 ENSG00000101230 isthmin 1, angiogenesis inhibitor NA NA
ETS2 2114 ENSG00000157557 ETS proto-oncogene 2, transcription factor This gene encodes a transcription factor which regulates genes involved in development and apoptosis. The encoded protein is also a protooncogene and shown to be involved in regulation of telomerase. A pseudogene of this gene is located on the X chromosome. Alternative splicing results in multiple transcript variants. NA
CTRB1 1504 ENSG00000168925 chymotrypsinogen B1 The protein encoded by this gene is one of a family of serine proteases that is secreted into the gastrointestinal tract as an inactive precursor, which is activated by proteolytic cleavage with trypsin. NA
RP11-34P1.2 ENSG00000254373 ENSG00000254373 NA NA NA
CYSRT1 375791 ENSG00000197191 cysteine rich tail 1 NA NA
SERPINB8 5271 ENSG00000166401 serpin family B member 8 The superfamily of high molecular weight serine proteinase inhibitors (serpins) regulate a diverse set of intracellular and extracellular processes such as complement activation, fibrinolysis, coagulation, cellular differentiation, tumor suppression, apoptosis, and cell migration. Serpins are characterized by well-conserved a tertiary structure that consists of 3 beta sheets and 8 or 9 alpha helices (Huber and Carrell, 1989 [PubMed 2690952]). A critical portion of the molecule, the reactive center loop connects beta sheets A and C. Protease inhibitor-8 (PI8; SERPINB8) is a member of the ov-serpin subfamily, which, relative to the archetypal serpin PI1 (MIM 107400), is characterized by a high degree of homology to chicken ovalbumin, lack of N- and C-terminal extensions, absence of a signal peptide, and a serine rather than an asparagine residue at the penultimate position (summary by Bartuski et al., 1997 [PubMed 9268635]). NA
CSDC2 27254 ENSG00000172346 cold shock domain containing C2 NA NA
IL20RB 53833 ENSG00000174564 interleukin 20 receptor subunit beta IL20RB and IL20RA (MIM 605620) form a heterodimeric receptor for interleukin-20 (IL20; MIM 605619) (Blumberg et al., 2001 [PubMed 11163236]). NA
MDK 4192 ENSG00000110492 midkine (neurite growth-promoting factor 2) This gene encodes a member of a small family of secreted growth factors that binds heparin and responds to retinoic acid. The encoded protein promotes cell growth, migration, and angiogenesis, in particular during tumorigenesis. This gene has been targeted as a therapeutic for a variety of different disorders. Alternatively spliced transcript variants encoding multiple isoforms have been observed. NA
RP11-315I20.3 ENSG00000244619 ENSG00000244619 NA NA NA
ITGA10 8515 ENSG00000143127 integrin subunit alpha 10 Integrins are integral transmembrane glycoproteins composed of noncovalently linked alpha and beta chains. They participate in cell adhesion as well as cell-surface mediated signalling. This gene encodes an integrin alpha chain and is expressed at high levels in chondrocytes, where it is transcriptionally regulated by AP-2epsilon and Ets-1. The protein encoded by this gene binds to collagen. Alternative splicing results in multiple transcript variants. NA
LIPE 3991 ENSG00000079435 lipase E, hormone sensitive type The protein encoded by this gene has a long and a short form, generated by use of alternative translational start codons. The long form is expressed in steroidogenic tissues such as testis, where it converts cholesteryl esters to free cholesterol for steroid hormone production. The short form is expressed in adipose tissue, among others, where it hydrolyzes stored triglycerides to free fatty acids. NA
PNLIPRP1 5407 ENSG00000187021 pancreatic lipase related protein 1 NA NA
AZGP1 563 ENSG00000160862 alpha-2-glycoprotein 1, zinc-binding NA NA
RNF144A 9781 ENSG00000151692 ring finger protein 144A The protein encoded by this protein contains a RING finger, a motif known to be involved in protein-DNA and protein-protein interactions. The mouse counterpart of this protein has been shown to interact with Ube2l3/UbcM4, which is an ubiquitin-conjugating enzyme involved in embryonic development. NA
C3orf18 51161 ENSG00000088543 chromosome 3 open reading frame 18 NA NA
OSBPL6 114880 ENSG00000079156 oxysterol binding protein like 6 This gene encodes a member of the oxysterol-binding protein (OSBP) family, a group of intracellular lipid receptors. Most members contain an N-terminal pleckstrin homology domain and a highly conserved C-terminal OSBP-like sterol-binding domain. Transcript variants encoding different isoforms have been identified. NA
MYH7 4625 ENSG00000092054 myosin, heavy chain 7, cardiac muscle, beta Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. NA
SCAMP5 192683 ENSG00000198794 secretory carrier membrane protein 5 NA NA
NA NA ENSG00000156750 NA NA TRUE
QPRT 23475 ENSG00000103485 quinolinate phosphoribosyltransferase This gene encodes a key enzyme in catabolism of quinolinate, an intermediate in the tryptophan-nicotinamide adenine dinucleotide pathway. Quinolinate acts as a most potent endogenous exitotoxin to neurons. Elevation of quinolinate levels in the brain has been linked to the pathogenesis of neurodegenerative disorders such as epilepsy, Alzheimer’s disease, and Huntington’s disease. Alternative splicing results in multiple transcript variants. NA
IL1RN 3557 ENSG00000136689 interleukin 1 receptor antagonist The protein encoded by this gene is a member of the interleukin 1 cytokine family. This protein inhibits the activities of interleukin 1, alpha (IL1A) and interleukin 1, beta (IL1B), and modulates a variety of interleukin 1 related immune and inflammatory responses. This gene and five other closely related cytokine genes form a gene cluster spanning approximately 400 kb on chromosome 2. A polymorphism of this gene is reported to be associated with increased risk of osteoporotic fractures and gastric cancer. Several alternatively spliced transcript variants encoding distinct isoforms have been reported. NA
PKP2 5318 ENSG00000057294 plakophilin 2 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This gene product may regulate the signaling activity of beta-catenin. Two alternately spliced transcripts encoding two protein isoforms have been identified. A processed pseudogene with high similarity to this locus has been mapped to chromosome 12p13. NA
MT1X 4501 ENSG00000187193 metallothionein 1X NA NA
LOC257396 257396 ENSG00000247796 uncharacterized LOC257396 NA NA
PPP1R1B 84152 ENSG00000131771 protein phosphatase 1 regulatory inhibitor subunit 1B This gene encodes a bifunctional signal transduction molecule. Dopaminergic and glutamatergic receptor stimulation regulates its phosphorylation and function as a kinase or phosphatase inhibitor. As a target for dopamine, this gene may serve as a therapeutic target for neurologic and psychiatric disorders. Multiple transcript variants encoding different isoforms have been found for this gene. NA
LINC01277 ENSG00000229017 ENSG00000229017 long intergenic non-protein coding RNA 1277 NA NA
CELA3B 23436 ENSG00000219073 chymotrypsin like elastase family member 3B Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. NA
LOC105370792 105370792 ENSG00000174171 uncharacterized LOC105370792 NA NA
CST6 1474 ENSG00000175315 cystatin E/M The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and the kininogens. The type 2 cystatin proteins are a class of cysteine proteinase inhibitors found in a variety of human fluids and secretions, where they appear to provide protective functions. This gene encodes a cystatin from the type 2 family, which is down-regulated in metastatic breast tumor cells as compared to primary tumor cells. Loss of expression is likely associated with the progression of a primary tumor to a metastatic phenotype. NA
CTD-2201G16.1 ENSG00000258444 ENSG00000258444 NA NA NA
CIB2 10518 ENSG00000136425 calcium and integrin binding family member 2 The protein encoded by this gene is similar to that of KIP/CIB, calcineurin B, and calmodulin. The encoded protein is a calcium-binding regulatory protein that interacts with DNA-dependent protein kinase catalytic subunits (DNA-PKcs), and it is involved in photoreceptor cell maintenance. Mutations in this gene cause deafness, autosomal recessive, 48 (DFNB48), and also Usher syndrome 1J (USH1J). Alternative splicing results in multiple transcript variants. NA
RP11-343H19.2 ENSG00000259827 ENSG00000259827 NA NA NA
UQCRHL 440567 ENSG00000233954 ubiquinol-cytochrome c reductase hinge protein like This gene has characteristics of a pseudogene derived from the UQCRH gene. However, there is still an open reading frame that could produce a protein of the same or nearly the same size as that of the UQCRH gene, so this gene is being called protein-coding for now. NA
KIAA1671 85379 ENSG00000197077 KIAA1671 NA NA
SLC29A4 222962 ENSG00000164638 solute carrier family 29 member 4 This gene encodes a member of the SLC29A/ENT transporter protein family. The encoded membrane protein catalyzes the reuptake of monoamines into presynaptic neurons, thus determining the intensity and duration of monoamine neural signaling. It has been shown to transport several compounds, including serotonin, dopamine, and the neurotoxin 1-methyl-4-phenylpyridinium. Alternative splicing results in multiple transcript variants. NA
CELA2B 51032 ENSG00000215704 chymotrypsin like elastase family member 2B Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Like most of the human elastases, elastase 2B is secreted from the pancreas as a zymogen. In other species, elastase 2B has been shown to preferentially cleave proteins after leucine, methionine, and phenylalanine residues. NA
SLC47A1 55244 ENSG00000142494 solute carrier family 47 member 1 This gene is located within the Smith-Magenis syndrome region on chromosome 17. It encodes a protein of unknown function. NA
PACSIN1 29993 ENSG00000124507 protein kinase C and casein kinase substrate in neurons 1 NA NA
PTTG1 9232 ENSG00000164611 pituitary tumor-transforming 1 The encoded protein is a homolog of yeast securin proteins, which prevent separins from promoting sister chromatid separation. It is an anaphase-promoting complex (APC) substrate that associates with a separin until activation of the APC. The gene product has transforming activity in vitro and tumorigenic activity in vivo, and the gene is highly expressed in various tumors. The gene product contains 2 PXXP motifs, which are required for its transforming and tumorigenic activities, as well as for its stimulation of basic fibroblast growth factor expression. It also contains a destruction box (D box) that is required for its degradation by the APC. The acidic C-terminal region of the encoded protein can act as a transactivation domain. The gene product is mainly a cytosolic protein, although it partially localizes in the nucleus. Three transcript variants encoding the same protein have been found for this gene. NA
PXDC1 221749 ENSG00000168994 PX domain containing 1 NA NA
VLDLR-AS1 401491 ENSG00000236404 VLDLR antisense RNA 1 NA NA
DUSP4 1846 ENSG00000120875 dual specificity phosphatase 4 The protein encoded by this gene is a member of the dual specificity protein phosphatase subfamily. These phosphatases inactivate their target kinases by dephosphorylating both the phosphoserine/threonine and phosphotyrosine residues. They negatively regulate members of the mitogen-activated protein (MAP) kinase superfamily (MAPK/ERK, SAPK/JNK, p38), which are associated with cellular proliferation and differentiation. Different members of the family of dual specificity phosphatases show distinct substrate specificities for various MAP kinases, different tissue distribution and subcellular localization, and different modes of inducibility of their expression by extracellular stimuli. This gene product inactivates ERK1, ERK2 and JNK, is expressed in a variety of tissues, and is localized in the nucleus. Two alternatively spliced transcript variants, encoding distinct isoforms, have been observed for this gene. In addition, multiple polyadenylation sites have been reported. NA
KRT16 3868 ENSG00000186832 keratin 16 The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains and are clustered in a region of chromosome 17q12-q21. This keratin has been coexpressed with keratin 14 in a number of epithelial tissues, including esophagus, tongue, and hair follicles. Mutations in this gene are associated with type 1 pachyonychia congenita, non-epidermolytic palmoplantar keratoderma and unilateral palmoplantar verrucous nevus. NA
CEL 1056 ENSG00000170835 carboxyl ester lipase The protein encoded by this gene is a glycoprotein secreted from the pancreas into the digestive tract and from the lactating mammary gland into human milk. The physiological role of this protein is in cholesterol and lipid-soluble vitamin ester hydrolysis and absorption. This encoded protein promotes large chylomicron production in the intestine. Also its presence in plasma suggests its interactions with cholesterol and oxidized lipoproteins to modulate the progression of atherosclerosis. In pancreatic tumoral cells, this encoded protein is thought to be sequestrated within the Golgi compartment and is probably not secreted. This gene contains a variable number of tandem repeat (VNTR) polymorphism in the coding region that may influence the function of the encoded protein. NA
SBSN 374897 ENSG00000189001 suprabasin NA NA
PDIA2 64714 ENSG00000185615 protein disulfide isomerase family A member 2 Protein disulfide isomerases (EC 5.3.4.1), such as PDIP, are endoplasmic reticulum (ER) resident proteins that catalyze protein folding and thiol-disulfide interchange reactions (Desilva et al., 1996 [PubMed 8561901]). NA
GADD45A 1647 ENSG00000116717 growth arrest and DNA damage inducible alpha This gene is a member of a group of genes whose transcript levels are increased following stressful growth arrest conditions and treatment with DNA-damaging agents. The protein encoded by this gene responds to environmental stresses by mediating activation of the p38/JNK pathway via MTK1/MEKK4 kinase. The DNA damage-induced transcription of this gene is mediated by both p53-dependent and -independent mechanisms. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. NA
SPAG4 6676 ENSG00000061656 sperm associated antigen 4 The mammalian sperm flagellum contains two cytoskeletal structures associated with the axoneme: the outer dense fibers surrounding the axoneme in the midpiece and principal piece and the fibrous sheath surrounding the outer dense fibers in the principal piece of the tail. Defects in these structures are associated with abnormal tail morphology, reduced sperm motility, and infertility. In the rat, the protein encoded by this gene associates with an outer dense fiber protein via a leucine zipper motif and localizes to the microtubules of the manchette and axoneme during sperm tail development. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
TG 7038 ENSG00000042832 thyroglobulin Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis. NA
LIPF 8513 ENSG00000182333 lipase F, gastric type This gene encodes gastric lipase, an enzyme involved in the digestion of dietary triglycerides in the gastrointestinal tract, and responsible for 30% of fat digestion processes occurring in human. It is secreted by gastric chief cells in the fundic mucosa of the stomach, and it hydrolyzes the ester bonds of triglycerides under acidic pH conditions. The gene is a member of a conserved gene family of lipases that play distinct roles in neutral lipid metabolism. Several transcript variants encoding different isoforms have been found for this gene. NA
THBS4 7060 ENSG00000113296 thrombospondin 4 The protein encoded by this gene belongs to the thrombospondin protein family. Thrombospondin family members are adhesive glycoproteins that mediate cell-to-cell and cell-to-matrix interactions. This protein forms a pentamer and can bind to heparin and calcium. It is involved in local signaling in the developing and adult nervous system, and it contributes to spinal sensitization and neuropathic pain states. This gene is activated during the stromal response to invasive breast cancer. It may also play a role in inflammatory responses in Alzheimer’s disease. Alternative splicing results in multiple transcript variants. NA
MMP11 4320 ENSG00000099953 matrix metallopeptidase 11 Proteins of the matrix metalloproteinase (MMP) family are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and metastasis. Most MMP’s are secreted as inactive proproteins which are activated when cleaved by extracellular proteinases. However, the enzyme encoded by this gene is activated intracellularly by furin within the constitutive secretory pathway. Also in contrast to other MMP’s, this enzyme cleaves alpha 1-proteinase inhibitor but weakly degrades structural proteins of the extracellular matrix. NA
LGALS7B 653499 ENSG00000178934 galectin 7B The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. Differential and in situ hybridization studies indicate that this lectin is specifically expressed in keratinocytes and found mainly in stratified squamous epithelium. A duplicate copy of this gene (GeneID:3963) is found adjacent to, but on the opposite strand on chromosome 19. NA
NA NA ENSG00000165862 NA NA TRUE
MYBPC2 4606 ENSG00000086967 myosin binding protein C, fast type This gene encodes a member of the myosin-binding protein C family. This family includes the fast-, slow- and cardiac-type isoforms, each of which is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The protein encoded by this locus is referred to as the fast-type isoform. Mutations in the related but distinct genes encoding the slow-type and cardiac-type isoforms have been associated with distal arthrogryposis, type 1 and hypertrophic cardiomyopathy, respectively. NA
SOWAHC 65124 ENSG00000198142 sosondowah ankyrin repeat domain family member C NA NA
STAMBPL1 57559 ENSG00000138134 STAM binding protein like 1 NA NA
PGF 5228 ENSG00000119630 placental growth factor This gene encodes a growth factor found in placenta which is homologous to vascular endothelial growth factor. Alternatively spliced transcripts encoding different isoforms have been found for this gene. NA
NEDD4 4734 ENSG00000069869 neural precursor cell expressed, developmentally down-regulated 4, E3 ubiquitin protein ligase NA NA
RP11-256I23.1 ENSG00000268896 ENSG00000268896 NA NA NA
PRTFDC1 56952 ENSG00000099256 phosphoribosyl transferase domain containing 1 NA NA
PTPRN2 5799 ENSG00000155093 protein tyrosine phosphatase, receptor type N2 This gene encodes a protein with sequence similarity to receptor-like protein tyrosine phosphatases. However, tyrosine phosphatase activity has not been experimentally validated for this protein. Studies of the rat ortholog suggest that the encoded protein may instead function as a phosphatidylinositol phosphatase with the ability to dephosphorylate phosphatidylinositol 3-phosphate and phosphatidylinositol 4,5-diphosphate, and this function may be involved in the regulation of insulin secretion. This protein has been identified as an autoantigen in insulin-dependent diabetes mellitus. Alternative splicing results in multiple transcript variants. NA
TCAP 8557 ENSG00000173991 titin-cap Sarcomere assembly is regulated by the muscle protein titin. Titin is a giant elastic protein with kinase activity that extends half the length of a sarcomere. It serves as a scaffold to which myofibrils and other muscle related proteins are attached. This gene encodes a protein found in striated and cardiac muscle that binds to the titin Z1-Z2 domains and is a substrate of titin kinase, interactions thought to be critical to sarcomere assembly. Mutations in this gene are associated with limb-girdle muscular dystrophy type 2G. NA
FITM1 161247 ENSG00000139914 fat storage inducing transmembrane protein 1 FIT1 belongs to an evolutionarily conserved family of proteins involved in fat storage (Kadereit et al., 2008 [PubMed 18160536]). NA
RAB6B 51560 ENSG00000154917 RAB6B, member RAS oncogene family NA NA
GRAMD1B 57476 ENSG00000023171 GRAM domain containing 1B NA NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",8,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 9 Annotations

out <- mygene::queryMany(gene_list[9,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name X_id summary symbol query notfound
chromogranin A 1113 The protein encoded by this gene is a member of the chromogranin/secretogranin family of neuroendocrine secretory proteins. It is found in secretory vesicles of neurons and endocrine cells. This gene product is a precursor to three biologically active peptides; vasostatin, pancreastatin, and parastatin. These peptides act as autocrine or paracrine negative modulators of the neuroendocrine system. Two other peptides, catestatin and chromofungin, have antimicrobial activity and antifungal activity, respectively. Two transcript variants encoding different isoforms have been found for this gene. CHGA ENSG00000100604 NA
protease, serine 3 5646 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is expressed in the brain and pancreas and is resistant to common trypsin inhibitors. It is active on peptide linkages involving the carboxyl group of lysine or arginine. This gene is localized to the locus of T cell receptor beta variable orphans on chromosome 9. Four transcript variants encoding different isoforms have been described for this gene. PRSS3 ENSG00000010438 NA
kinesin family member 5A 3798 This gene encodes a member of the kinesin family of proteins. Members of this family are part of a multisubunit complex that functions as a microtubule motor in intracellular organelle transport. Mutations in this gene cause autosomal dominant spastic paraplegia 10. KIF5A ENSG00000155980 NA
carboxypeptidase A2 1358 Three different forms of human pancreatic procarboxypeptidase A have been isolated. The encoded protein represents the A2 form, which is a monomeric protein with different biochemical properties from the A1 and A3 forms. The A2 form of pancreatic procarboxypeptidase acts on aromatic C-terminal residues and is a secreted protein. CPA2 ENSG00000158516 NA
epithelial cell adhesion molecule 4072 This gene encodes a carcinoma-associated antigen and is a member of a family that includes at least two type I membrane proteins. This antigen is expressed on most normal epithelial cells and gastrointestinal carcinomas and functions as a homotypic calcium-independent cell adhesion molecule. The antigen is being used as a target for immunotherapy treatment of human carcinomas. Mutations in this gene result in congenital tufting enteropathy. EPCAM ENSG00000119888 NA
ST3 beta-galactoside alpha-2,3-sialyltransferase 6 10402 The protein encoded by this gene is a member of the sialyltransferase family. Members of this family are enzymes that transfer sialic acid from the activated cytidine 5’-monophospho-N-acetylneuraminic acid to terminal positions on sialylated glycolipids (gangliosides) or to the N- or O-linked sugar chains of glycoproteins. This protein has high specificity for neolactotetraosylceramide and neolactohexaosylceramide as glycolipid substrates and may contribute to the formation of selectin ligands and sialyl Lewis X, a carbohydrate important for cell-to-cell recognition and a blood group antigen. ST3GAL6 ENSG00000064225 NA
ATPase Na+/K+ transporting subunit beta 1 481 The protein encoded by this gene belongs to the family of Na+/K+ and H+/K+ ATPases beta chain proteins, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The beta subunit regulates, through assembly of alpha/beta heterodimers, the number of sodium pumps transported to the plasma membrane. The glycoprotein subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes a beta 1 subunit. Alternatively spliced transcript variants encoding different isoforms have been described, but their biological validity is not known. ATP1B1 ENSG00000143153 NA
procollagen C-endopeptidase enhancer 2 26577 NA PCOLCE2 ENSG00000163710 NA
hes family bHLH transcription factor 6 55502 This gene encodes a member of a subfamily of basic helix-loop-helix transcription repressors that have homology to the Drosophila enhancer of split genes. Members of this gene family regulate cell differentiation in numerous cell types. The protein encoded by this gene functions as a cofactor, interacting with other transcription factors through a tetrapeptide domain in its C-terminus. Alternatively spliced transcript variants encoding different isoforms have been described. HES6 ENSG00000144485 NA
syntaphilin 9751 Syntaxin-1, synaptobrevin/VAMP, and SNAP25 interact to form the SNARE complex, which is required for synaptic vesicle docking and fusion. The protein encoded by this gene is membrane-associated and inhibits SNARE complex formation by binding free syntaxin-1. Expression of this gene appears to be brain-specific. Alternative splicing results in multiple transcript variants encoding different isoforms. SNPH ENSG00000101298 NA
neuralized E3 ubiquitin protein ligase 1 9148 NA NEURL1 ENSG00000107954 NA
aquaporin 9 366 The aquaporins are a family of water-selective membrane channels. This gene encodes a member of a subset of aquaporins called the aquaglyceroporins. This protein allows passage of a broad range of noncharged solutes and also stimulates urea transport and osmotic water permeability. This protein may also facilitate the uptake of glycerol in hepatic tissue . The encoded protein may also play a role in specialized leukocyte functions such as immunological response and bactericidal activity. Alternate splicing results in multiple transcript variants. AQP9 ENSG00000103569 NA
prune homolog 2 158471 The protein encoded by this gene belongs to the B-cell CLL/lymphoma 2 and adenovirus E1B 19 kDa interacting family, whose members play roles in many cellular processes including apotosis, cell transformation, and synaptic function. Several functions for this protein have been demonstrated including suppression of Ras homolog family member A activity, which results in reduced stress fiber formation and suppression of oncogenic cellular transformation. A high molecular weight isoform of this protein has also been shown to colocalize with Adaptor protein complex 2, beta-Adaptin and endodermal markers, suggesting an involvement in post-endocytic trafficking. In prostate cancer cells, this gene acts as a tumor suppressor and its expression is regulated by prostate cancer antigen 3, a non-protein coding gene on the opposite DNA strand in an intron of this gene. Prostate cancer antigen 3 regulates levels of this gene through formation of a double-stranded RNA that undergoes adenosine deaminase actin on RNA-dependent adenosine-to-inosine RNA editing. Alternative splicing results in multiple transcript variants. PRUNE2 ENSG00000106772 NA
progastricsin 5225 This gene encodes an aspartic proteinase that belongs to the peptidase family A1. The encoded protein is a digestive enzyme that is produced in the stomach and constitutes a major component of the gastric mucosa. This protein is also secreted into the serum. This protein is synthesized as an inactive zymogen that includes a highly basic prosegment. This enzyme is converted into its active mature form at low pH by sequential cleavage of the prosegment that is carried out by the enzyme itself. Polymorphisms in this gene are associated with susceptibility to gastric cancers. Serum levels of this enzyme are used as a biomarker for certain gastric diseases including Helicobacter pylori related gastritis. Alternate splicing results in multiple transcript variants. A pseudogene of this gene is found on chromosome 1. PGC ENSG00000096088 NA
phosphorylase kinase, alpha 1 pseudogene 1 ENSG00000232882 NA PHKA1P1 ENSG00000232882 NA
lin-7 homolog A, crumbs cell polarity complex component 8825 The protein encoded by this gene is involved in generating and maintaining the asymmetric distribution of channels and receptors at the cell membrane. The encoded protein also is required for the localization of some specific channels and can be part of a protein complex that couples synaptic vesicle exocytosis to cell adhesion in the brain. LIN7A ENSG00000111052 NA
PILR alpha associated neural protein 196500 This gene encodes a ligand for the paired immunoglobin-like type 2 receptor alpha, and so may be involved in immune regulation. Alternate splicing results in multiple transcript variants encoding different proteins. PIANP ENSG00000139200 NA
immunoglobulin heavy constant alpha 1 ENSG00000211895 NA IGHA1 ENSG00000211895 NA
glutathione peroxidase 2 2877 This gene is a member of the glutathione peroxidase family and encodes a selenium-dependent glutathione peroxidase that is one of two isoenzymes responsible for the majority of the glutathione-dependent hydrogen peroxide-reducing activity in the epithelium of the gastrointestinal tract. The protein encoded by this locus contains a selenocysteine (Sec) residue encoded by the UGA codon, which normally signals translation termination. Alternatively spliced transcript variants have been described. GPX2 ENSG00000176153 NA
immunoglobulin heavy constant alpha 2 (A2m marker) ENSG00000211890 NA IGHA2 ENSG00000211890 NA
potassium calcium-activated channel subfamily M alpha 1 3778 MaxiK channels are large conductance, voltage and calcium-sensitive potassium channels which are fundamental to the control of smooth muscle tone and neuronal excitability. MaxiK channels can be formed by 2 subunits: the pore-forming alpha subunit, which is the product of this gene, and the modulatory beta subunit. Intracellular calcium regulates the physical association between the alpha and beta subunits. Alternatively spliced transcript variants encoding different isoforms have been identified. KCNMA1 ENSG00000156113 NA
NA NA NA NA ENSG00000156750 TRUE
prostate stem cell antigen 8000 This gene encodes a glycosylphosphatidylinositol-anchored cell membrane glycoprotein. In addition to being highly expressed in the prostate it is also expressed in the bladder, placenta, colon, kidney, and stomach. This gene is up-regulated in a large proportion of prostate cancers and is also detected in cancers of the bladder and pancreas. This gene includes a polymorphism that results in an upstream start codon in some individuals; this polymorphism is thought to be associated with a risk for certain gastric and bladder cancers. Alternative splicing results in multiple transcript variants. PSCA ENSG00000167653 NA
myosin light chain 2 4633 Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy. MYL2 ENSG00000111245 NA
transmembrane protein 158 (gene/pseudogene) 25907 Constitutive activation of the Ras pathway triggers an irreversible proliferation arrest reminiscent of replicative senescence. Transcription of this gene is upregulated in response to activation of the Ras pathway, but not under other conditions that induce senescence. The encoded protein is similar to a rat cell surface receptor proposed to function in a neuronal survival pathway. An allelic polymorphism in this gene results in both functional and non-functional (frameshifted) alleles; the reference genome represents the functional allele. TMEM158 ENSG00000249992 NA
NA ENSG00000261534 NA RP11-244O19.1 ENSG00000261534 NA
regulator of G-protein signaling 9 8787 This gene encodes a member of the RGS family of GTPase activating proteins that function in various signaling pathways by accelerating the deactivation of G proteins. This protein is anchored to photoreceptor membranes in retinal cells and deactivates G proteins in the rod and cone phototransduction cascades. Mutations in this gene result in bradyopsia. Multiple transcript variants encoding different isoforms have been found for this gene. RGS9 ENSG00000108370 NA
PITPNM family member 3 83394 This gene encodes a member of a family of membrane-associated phosphatidylinositol transfer domain-containing proteins. The calcium-binding protein has phosphatidylinositol (PI) transfer activity and interacts with the protein tyrosine kinase PTK2B (also known as PYK2). The protein is homologous to a Drosophila protein that is implicated in the visual transduction pathway in flies. Mutations in this gene result in autosomal dominant cone dystrophy. Multiple transcript variants encoding different isoforms have been found for this gene. PITPNM3 ENSG00000091622 NA
immunoglobulin lambda like polypeptide 5 100423062 This gene encodes one of the immunoglobulin lambda-like polypeptides. It is located within the immunoglobulin lambda locus but it does not require somatic rearrangement for expression. The first exon of this gene is unrelated to immunoglobulin variable genes; the second and third exons are the immunoglobulin lambda joining 1 and the immunoglobulin lambda constant 1 gene segments. Alternative splicing results in multiple transcript variants. IGLL5 ENSG00000254709 NA
prominin 2 150696 This gene encodes a member of the prominin family of pentaspan membrane glycoproteins. The encoded protein localizes to basal epithelial cells and may be involved in the organization of plasma membrane microdomains. Alternative splicing results in multiple transcript variants. PROM2 ENSG00000155066 NA
stratifin 2810 NA SFN ENSG00000175793 NA
transmembrane protein 59 like 25789 This gene encodes a predicted type-I membrane glycoprotein. The encoded protein may play a role in functioning of the central nervous system. TMEM59L ENSG00000105696 NA
polymeric immunoglobulin receptor 5284 This gene is a member of the immunoglobulin superfamily. The encoded poly-Ig receptor binds polymeric immunoglobulin molecules at the basolateral surface of epithelial cells; the complex is then transported across the cell to be secreted at the apical surface. A significant association was found between immunoglobulin A nephropathy and several SNPs in this gene. PIGR ENSG00000162896 NA
myosin light chain 3 4634 MYL3 encodes myosin light chain 3, an alkali light chain also referred to in the literature as both the ventricular isoform and the slow skeletal muscle isoform. Mutations in MYL3 have been identified as a cause of mid-left ventricular chamber type hypertrophic cardiomyopathy. MYL3 ENSG00000160808 NA
fucosyltransferase 2 2524 The protein encoded by this gene is a Golgi stack membrane protein that is involved in the creation of a precursor of the H antigen, which is required for the final step in the soluble A and B antigen synthesis pathway. This gene is one of two encoding the galactoside 2-L-fucosyltransferase enzyme. Two transcript variants encoding the same protein have been found for this gene. FUT2 ENSG00000176920 NA
mucin 1, cell surface associated 4582 This gene encodes a membrane-bound protein that is a member of the mucin family. Mucins are O-glycosylated proteins that play an essential role in forming protective mucous barriers on epithelial surfaces. These proteins also play a role in intracellular signaling. This protein is expressed on the apical surface of epithelial cells that line the mucosal surfaces of many different tissues including lung, breast stomach and pancreas. This protein is proteolytically cleaved into alpha and beta subunits that form a heterodimeric complex. The N-terminal alpha subunit functions in cell-adhesion and the C-terminal beta subunit is involved in cell signaling. Overexpression, aberrant intracellular localization, and changes in glycosylation of this protein have been associated with carcinomas. This gene is known to contain a highly polymorphic variable number tandem repeats (VNTR) domain. Alternate splicing results in multiple transcript variants. MUC1 ENSG00000185499 NA
leucine rich repeat containing 4B 94030 NA LRRC4B ENSG00000131409 NA
neogenin 1 4756 This gene encodes a cell surface protein that is a member of the immunoglobulin superfamily. The encoded protein consists of four N-terminal immunoglobulin-like domains, six fibronectin type III domains, a transmembrane domain and a C-terminal internal domain that shares homology with the tumor suppressor candidate gene DCC. This protein may be involved in cell growth and differentiation and in cell-cell adhesion. Defects in this gene are associated with cell proliferation in certain cancers. Alternate splicing results in multiple transcript variants. NEO1 ENSG00000067141 NA
RAB25, member RAS oncogene family 57111 The protein encoded by this gene is a member of the RAS superfamily of small GTPases. The encoded protein is involved in membrane trafficking and cell survival. This gene has been found to be a tumor suppressor and an oncogene, depending on the context. Two variants, one protein-coding and the other not, have been found for this gene. RAB25 ENSG00000132698 NA
lipase F, gastric type 8513 This gene encodes gastric lipase, an enzyme involved in the digestion of dietary triglycerides in the gastrointestinal tract, and responsible for 30% of fat digestion processes occurring in human. It is secreted by gastric chief cells in the fundic mucosa of the stomach, and it hydrolyzes the ester bonds of triglycerides under acidic pH conditions. The gene is a member of a conserved gene family of lipases that play distinct roles in neutral lipid metabolism. Several transcript variants encoding different isoforms have been found for this gene. LIPF ENSG00000182333 NA
atlastin GTPase 1 51062 The protein encoded by this gene is a GTPase and a Golgi body transmembrane protein. The encoded protein can form a homotetramer and has been shown to interact with spastin and with mitogen-activated protein kinase kinase kinase kinase 4. This protein may be involved in axonal maintenance as evidenced by the fact that defects in this gene are a cause of spastic paraplegia type 3. Three transcript variants encoding two different isoforms have been found for this gene. ATL1 ENSG00000198513 NA
C-terminal binding protein 2 1488 This gene produces alternative transcripts encoding two distinct proteins. One protein is a transcriptional repressor, while the other isoform is a major component of specialized synapses known as synaptic ribbons. Both proteins contain a NAD+ binding domain similar to NAD+-dependent 2-hydroxyacid dehydrogenases. A portion of the 3’ untranslated region was used to map this gene to chromosome 21q21.3; however, it was noted that similar loci elsewhere in the genome are likely. Blast analysis shows that this gene is present on chromosome 10. Several transcript variants encoding two different isoforms have been found for this gene. CTBP2 ENSG00000175029 NA
acyl-CoA synthetase long-chain family member 1 2180 The protein encoded by this gene is an isozyme of the long-chain fatty-acid-coenzyme A ligase family. Although differing in substrate specificity, subcellular localization, and tissue distribution, all isozymes of this family convert free long-chain fatty acids into fatty acyl-CoA esters, and thereby play a key role in lipid biosynthesis and fatty acid degradation. Several transcript variants encoding different isoforms have been found for this gene. ACSL1 ENSG00000151726 NA
NA ENSG00000261054 NA RP11-6O2.4 ENSG00000261054 NA
carbohydrate (N-acetylgalactosamine 4-sulfate 6-O) sulfotransferase 15 51363 Chondroitin sulfate (CS) is a glycosaminoglycan which is an important structural component of the extracellular matrix and which links to proteins to form proteoglycans. Chondroitin sulfate E (CS-E) is an isomer of chondroitin sulfate in which the C-4 and C-6 hydroxyl groups are sulfated. This gene encodes a type II transmembrane glycoprotein that acts as a sulfotransferase to transfer sulfate to the C-6 hydroxal group of chondroitin sulfate. This gene has also been identified as being co-expressed with RAG1 in B-cells and as potentially acting as a B-cell surface signaling receptor. Alternative splicing results in multiple transcript variants encoding distinct isoforms. CHST15 ENSG00000182022 NA
syntaxin 11 8676 This gene encodes a member of the syntaxin family. Syntaxins have been implicated in the targeting and fusion of intracellular transport vesicles. This family member may regulate protein transport among late endosomes and the trans-Golgi network. Mutations in this gene have been associated with familial hemophagocytic lymphohistiocytosis. STX11 ENSG00000135604 NA
natriuretic peptide A 4878 The protein encoded by this gene belongs to the natriuretic peptide family. Natriuretic peptides are implicated in the control of extracellular fluid volume and electrolyte homeostasis. This protein is synthesized as a large precursor (containing a signal peptide), which is processed to release a peptide from the N-terminus with similarity to vasoactive peptide, cardiodilatin, and another peptide from the C-terminus with natriuretic-diuretic activity. Mutations in this gene have been associated with atrial fibrillation familial type 6. This gene is located adjacent to another member of the natriuretic family of peptides on chromosome 1. NPPA ENSG00000175206 NA
PDZ domain containing ring finger 3 23024 This gene encodes a member of the LNX (Ligand of Numb Protein-X) family of RING-type ubiquitin E3 ligases. This protein may function in vascular morphogenesis and the differentiation of adipocytes, osteoblasts and myoblasts. This protein may be targeted for degradation by the human papilloma virus E6 protein. Alternative splicing results in multiple transcript variants. PDZRN3 ENSG00000121440 NA
pancreatic lipase related protein 1 5407 NA PNLIPRP1 ENSG00000187021 NA
polypeptide N-acetylgalactosaminyltransferase 12 79695 This gene encodes a member of a family of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferases, which catalyze the transfer of N-acetylgalactosamine (GalNAc) from UDP-GalNAc to a serine or threonine residue on a polypeptide acceptor in the initial step of O-linked protein glycosylation. Mutations in this gene are associated with an increased susceptibility to colorectal cancer. GALNT12 ENSG00000119514 NA
SCOC antisense RNA 1 100129858 NA SCOC-AS1 ENSG00000196951 NA
nucleoredoxin 64359 This gene encodes a member of the thioredoxin superfamily, a group of small, multifunctional redox-active proteins. Members of this family are characterized by a conserved active motif called the thioredoxin fold that catalyzes disulfide bond formation and isomerization. The encoded protein acts a redox-dependent regulator of the Wnt signaling pathway and is involved in cell growth and differentiation. NXN ENSG00000167693 NA
apolipoprotein C1 341 This gene encodes a member of the apolipoprotein C1 family. This gene is expressed primarily in the liver, and it is activated when monocytes differentiate into macrophages. The encoded protein plays a central role in high density lipoprotein (HDL) and very low density lipoprotein (VLDL) metabolism. This protein has also been shown to inhibit cholesteryl ester transfer protein in plasma. A pseudogene of this gene is located 4 kb downstream in the same orientation, on the same chromosome. This gene is mapped to chromosome 19, where it resides within a apolipoprotein gene cluster. APOC1 ENSG00000130208 NA
importin 7 pseudogene 2 ENSG00000225674 NA IPO7P2 ENSG00000225674 NA
immunoglobulin lambda constant 1 (Mcg marker) ENSG00000211675 NA IGLC1 ENSG00000211675 NA
cell death inducing DFFA like effector c 63924 This gene encodes a member of the cell death-inducing DNA fragmentation factor-like effector family. Members of this family play important roles in apoptosis. The encoded protein promotes lipid droplet formation in adipocytes and may mediate adipocyte apoptosis. This gene is regulated by insulin and its expression is positively correlated with insulin sensitivity. Mutations in this gene may contribute to insulin resistant diabetes. A pseudogene of this gene is located on the short arm of chromosome 3. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. CIDEC ENSG00000187288 NA
lipase E, hormone sensitive type 3991 The protein encoded by this gene has a long and a short form, generated by use of alternative translational start codons. The long form is expressed in steroidogenic tissues such as testis, where it converts cholesteryl esters to free cholesterol for steroid hormone production. The short form is expressed in adipose tissue, among others, where it hydrolyzes stored triglycerides to free fatty acids. LIPE ENSG00000079435 NA
olfactomedin 4 10562 This gene was originally cloned from human myeloblasts and found to be selectively expressed in inflammed colonic epithelium. This gene encodes a member of the olfactomedin family. The encoded protein is an antiapoptotic factor that promotes tumor growth and is an extracellular matrix glycoprotein that facilitates cell adhesion. OLFM4 ENSG00000102837 NA
cytochrome P450 family 2 subfamily J member 2 1573 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and is thought to be the predominant enzyme responsible for epoxidation of endogenous arachidonic acid in cardiac tissue. Multiple transcript variants have been found for this gene. CYP2J2 ENSG00000134716 NA
membrane palmitoylated protein 7 143098 The protein encoded by this gene is a member of the p55 Stardust family of membrane-associated guanylate kinase (MAGUK) proteins, which function in the establishment of epithelial cell polarity. This family member forms a complex with the polarity protein DLG1 (discs, large homolog 1) and facilitates epithelial cell polarity and tight junction formation. Polymorphisms in this gene are associated with variations in site-specific bone mineral density (BMD). Alternative splicing results in multiple transcript variants. MPP7 ENSG00000150054 NA
NA ENSG00000254680 NA RP11-265D17.2 ENSG00000254680 NA
plastin 1 5357 Plastins are a family of actin-binding proteins that are conserved throughout eukaryote evolution and expressed in most tissues of higher eukaryotes. In humans, two ubiquitous plastin isoforms (L and T) have been identified. The protein encoded by this gene is a third distinct plastin isoform, which is specifically expressed at high levels in the small intestine. Alternatively spliced transcript variants varying in the 5’ UTR, but encoding the same protein, have been found for this gene. A pseudogene of this gene is found on chromosome 11. PLS1 ENSG00000120756 NA
NA NA NA NA ENSG00000250606 TRUE
ras homolog family member U 58480 This gene encodes a member of the Rho family of GTPases. This protein can activate PAK1 and JNK1, and can induce filopodium formation and stress fiber dissolution. It may also mediate the effects of WNT1 signaling in the regulation of cell morphology, cytoskeletal organization, and cell proliferation. A non-coding transcript variant of this gene results from naturally occurring read-through transcription between this locus and the neighboring DUSP5P (dual specificity phosphatase 5 pseudogene) locus. RHOU ENSG00000116574 NA
C1q and tumor necrosis factor related protein 3 114899 NA C1QTNF3 ENSG00000082196 NA
solute carrier family 22 member 17 51310 NA SLC22A17 ENSG00000092096 NA
KIAA1522 57648 NA KIAA1522 ENSG00000162522 NA
NA ENSG00000263065 NA AF001548.6 ENSG00000263065 NA
NA ENSG00000261240 NA RP11-304L19.4 ENSG00000261240 NA
transmembrane protein 54 113452 NA TMEM54 ENSG00000121900 NA
sine oculis binding protein homolog 55084 The protein encoded by this gene is a nuclear zinc finger protein that is involved in development of the cochlea. Defects in this gene have also been linked to intellectual disability. SOBP ENSG00000112320 NA
sphingosine-1-phosphate receptor 1 1901 The protein encoded by this gene is structurally similar to G protein-coupled receptors and is highly expressed in endothelial cells. It binds the ligand sphingosine-1-phosphate with high affinity and high specificity, and suggested to be involved in the processes that regulate the differentiation of endothelial cells. Activation of this receptor induces cell-cell adhesion. Alternative splicing results in multiple transcript variants. S1PR1 ENSG00000170989 NA
S100 calcium binding protein B 6285 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21; however, this gene is located at 21q22.3. This protein may function in Neurite extension, proliferation of melanoma cells, stimulation of Ca2+ fluxes, inhibition of PKC-mediated phosphorylation, astrocytosis and axonal proliferation, and inhibition of microtubule assembly. Chromosomal rearrangements and altered expression of this gene have been implicated in several neurological, neoplastic, and other types of diseases, including Alzheimer’s disease, Down’s syndrome, epilepsy, amyotrophic lateral sclerosis, melanoma, and type I diabetes. S100B ENSG00000160307 NA
zinc finger protein 853 54753 NA ZNF853 ENSG00000236609 NA
proline rich transmembrane protein 2 112476 This gene encodes a transmembrane protein containing a proline-rich domain in its N-terminal half. Studies in mice suggest that it is predominantly expressed in brain and spinal cord in embryonic and postnatal stages. Mutations in this gene are associated with episodic kinesigenic dyskinesia-1. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. PRRT2 ENSG00000167371 NA
NA NA NA NA ENSG00000225490 TRUE
discoidin domain receptor tyrosine kinase 1 780 Receptor tyrosine kinases play a key role in the communication of cells with their microenvironment. These kinases are involved in the regulation of cell growth, differentiation and metabolism. The protein encoded by this gene belongs to a subfamily of tyrosine kinase receptors with homology to Dictyostelium discoideum protein discoidin I in their extracellular domain, and that are activated by various types of collagen. Expression of this protein is restricted to epithelial cells, particularly in the kidney, lung, gastrointestinal tract, and brain. In addition, it has been shown to be significantly overexpressed in several human tumors. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. DDR1 ENSG00000204580 NA
nudix hydrolase 8 254552 NA NUDT8 ENSG00000167799 NA
NA ENSG00000223774 NA RP11-307B6.3 ENSG00000223774 NA
BR serine/threonine kinase 1 84446 NA BRSK1 ENSG00000160469 NA
dysbindin (dystrobrevin binding protein 1) domain containing 1 79007 NA DBNDD1 ENSG00000003249 NA
retinol binding protein 1 5947 This gene encodes the carrier protein involved in the transport of retinol (vitamin A alcohol) from the liver storage site to peripheral tissue. Vitamin A is a fat-soluble vitamin necessary for growth, reproduction, differentiation of epithelial tissues, and vision. Multiple transcript variants encoding different isoforms have been found for this gene. RBP1 ENSG00000114115 NA
protease, serine 8 5652 This gene encodes a member of the peptidase S1 or chymotrypsin family of serine proteases. The encoded preproprotein is proteolytically processed to generate light and heavy chains that associate via a disulfide bond to form the heterodimeric enzyme. This enzyme is highly expressed in prostate epithelia and is one of several proteolytic enzymes found in seminal fluid. This protease exhibits trypsin-like substrate specificity, cleaving protein substrates at the carboxyl terminus of lysine or arginine residues. The encoded protease partially mediates proteolytic activation of the epithelial sodium channel, a regulator of sodium balance, and may also play a role in epithelial barrier formation. PRSS8 ENSG00000052344 NA
ectonucleotide pyrophosphatase/phosphodiesterase 5 (putative) 59084 This gene encodes a type-I transmembrane glycoprotein. Studies in rat suggest the encoded protein may play a role in neuronal cell communications. Alternatively spliced transcript variants have been described. ENPP5 ENSG00000112796 NA
acyl-CoA thioesterase 11 26027 This gene encodes a member of the acyl-CoA thioesterase family which catalyse the conversion of activated fatty acids to the corresponding non-esterified fatty acid and coenzyme A. Expression of a mouse homolog in brown adipose tissue is induced by low temperatures and repressed by warm temperatures. Higher levels of expression of the mouse homolog has been found in obesity-resistant mice compared with obesity-prone mice, suggesting a role of acyl-CoA thioesterase 11 in obesity. Alternative splicing results in transcript variants. ACOT11 ENSG00000162390 NA
dishevelled segment polarity protein 1 1855 DVL1, the human homolog of the Drosophila dishevelled gene (dsh) encodes a cytoplasmic phosphoprotein that regulates cell proliferation, acting as a transducer molecule for developmental processes, including segmentation and neuroblast specification. DVL1 is a candidate gene for neuroblastomatous transformation. The Schwartz-Jampel syndrome and Charcot-Marie-Tooth disease type 2A have been mapped to the same region as DVL1. The phenotypes of these diseases may be consistent with defects which might be expected from aberrant expression of a DVL gene during development. DVL1 ENSG00000107404 NA
intermediate filament family orphan 2 126917 NA IFFO2 ENSG00000169991 NA
ATPase phospholipid transporting 9A (putative) 10079 NA ATP9A ENSG00000054793 NA
myozenin 2 51778 The protein encoded by this gene belongs to a family of sarcomeric proteins that bind to calcineurin, a phosphatase involved in calcium-dependent signal transduction in diverse cell types. These family members tether calcineurin to alpha-actinin at the z-line of the sarcomere of cardiac and skeletal muscle cells, and thus they are important for calcineurin signaling. Mutations in this gene cause cardiomyopathy familial hypertrophic type 16, a hereditary heart disorder. MYOZ2 ENSG00000172399 NA
G protein subunit alpha z 2781 The protein encoded by this gene is a member of a G protein subfamily that mediates signal transduction in pertussis toxin-insensitive systms. This encoded protein may play a role in maintaining the ionic balance of perilymphatic and endolymphatic cochlear fluids. GNAZ ENSG00000128266 NA
beta-2-microglobulin 567 This gene encodes a serum protein found in association with the major histocompatibility complex (MHC) class I heavy chain on the surface of nearly all nucleated cells. The protein has a predominantly beta-pleated sheet structure that can form amyloid fibrils in some pathological conditions. The encoded antimicrobial protein displays antibacterial activity in amniotic fluid. A mutation in this gene has been shown to result in hypercatabolic hypoproteinemia. B2M ENSG00000166710 NA
NA ENSG00000247134 NA RP11-11N9.4 ENSG00000247134 NA
cullin associated and neddylation dissociated 2 (putative) 23066 NA CAND2 ENSG00000144712 NA
inositol-trisphosphate 3-kinase A 3706 Regulates inositol phosphate metabolism by phosphorylation of second messenger inositol 1,4,5-trisphosphate to Ins(1,3,4,5)P4. The activity of the inositol 1,4,5-trisphosphate 3-kinase is responsible for regulating the levels of a large number of inositol polyphosphates that are important in cellular signaling. Both calcium/calmodulin and protein phosphorylation mechanisms control its activity. It is also a substrate for the cyclic AMP-dependent protein kinase, calcium/calmodulin- dependent protein kinase II, and protein kinase C in vitro. ITPKA ENSG00000137825 NA
carbonic anhydrase 9 768 Carbonic anhydrases (CAs) are a large family of zinc metalloenzymes that catalyze the reversible hydration of carbon dioxide. They participate in a variety of biological processes, including respiration, calcification, acid-base balance, bone resorption, and the formation of aqueous humor, cerebrospinal fluid, saliva, and gastric acid. They show extensive diversity in tissue distribution and in their subcellular localization. CA IX is a transmembrane protein and is one of only two tumor-associated carbonic anhydrase isoenzymes known. It is expressed in all clear-cell renal cell carcinoma, but is not detected in normal kidney or most other normal tissues. It may be involved in cell proliferation and transformation. This gene was mapped to 17q21.2 by fluorescence in situ hybridization, however, radiation hybrid mapping localized it to 9p13-p12. CA9 ENSG00000107159 NA
NA ENSG00000229212 NA RP11-561C5.4 ENSG00000229212 NA
NA ENSG00000259684 NA RP11-120K9.2 ENSG00000259684 NA
microtubule associated monooxygenase, calponin and LIM domain containing 2 9645 NA MICAL2 ENSG00000133816 NA
p21 (RAC1) activated kinase 1 5058 This gene encodes a family member of serine/threonine p21-activating kinases, known as PAK proteins. These proteins are critical effectors that link RhoGTPases to cytoskeleton reorganization and nuclear signaling, and they serve as targets for the small GTP binding proteins Cdc42 and Rac. This specific family member regulates cell motility and morphology. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. PAK1 ENSG00000149269 NA
naked cuticle homolog 1 85407 In the mouse, Nkd is a Dishevelled (see DVL1; MIM 601365)-binding protein that functions as a negative regulator of the Wnt (see WNT1; MIM 164820)-beta-catenin (see MIM 116806)-Tcf (see MIM 602272) signaling pathway. NKD1 ENSG00000140807 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",9,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 10 Annotations

out <- mygene::queryMany(gene_list[10,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
X_id summary name symbol query
56265 This gene likely encodes a member of the carboxypeptidase family of proteins. Cloning of a comparable locus in mouse indicates that the encoded protein contains a discoidin domain and a carboxypeptidase domain, but the protein appears to lack residues necessary for carboxypeptidase activity. carboxypeptidase X (M14 family), member 1 CPXM1 ENSG00000088882
ENSG00000263065 NA NA AF001548.6 ENSG00000263065
ENSG00000263335 NA NA AF001548.5 ENSG00000263335
81610 NA family with sequence similarity 83 member D FAM83D ENSG00000101447
1674 This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. desmin DES ENSG00000175084
9472 The A-kinase anchor proteins (AKAPs) are a group of structurally diverse proteins, which have the common function of binding to the regulatory subunit of protein kinase A (PKA) and confining the holoenzyme to discrete locations within the cell. This gene encodes a member of the AKAP family. The encoded protein is highly expressed in various brain regions and cardiac and skeletal muscle. It is specifically localized to the sarcoplasmic reticulum and nuclear membrane, and is involved in anchoring PKA to the nuclear membrane or sarcoplasmic reticulum. A-kinase anchoring protein 6 AKAP6 ENSG00000151320
ENSG00000234638 NA NA AC053503.6 ENSG00000234638
1264 NA calponin 1 CNN1 ENSG00000130176
4629 The protein encoded by this gene is a smooth muscle myosin belonging to the myosin heavy chain family. The gene product is a subunit of a hexameric protein that consists of two heavy chain subunits and two pairs of non-identical light chain subunits. It functions as a major contractile protein, converting chemical energy into mechanical energy through the hydrolysis of ATP. The gene encoding a human ortholog of rat NUDE1 is transcribed from the reverse strand of this gene, and its 3’ end overlaps with that of the latter. The pericentric inversion of chromosome 16 [inv(16)(p13q22)] produces a chimeric transcript that encodes a protein consisting of the first 165 residues from the N terminus of core-binding factor beta in a fusion with the C-terminal portion of the smooth muscle myosin heavy chain. This chromosomal rearrangement is associated with acute myeloid leukemia of the M4Eo subtype. Alternative splicing generates isoforms that are differentially expressed, with ratios changing during muscle cell maturation. Alternatively spliced transcript variants encoding different isoforms have been identified. myosin, heavy chain 11, smooth muscle MYH11 ENSG00000133392
51676 This gene encodes a member of the ankyrin repeat and SOCS box-containing (ASB) protein family. These proteins play a role in protein degradation by coupling suppressor of cytokine signalling (SOCS) proteins with the elongin BC complex. The encoded protein is a subunit of a multimeric E3 ubiquitin ligase complex that mediates the degradation of actin-binding proteins. This gene plays a role in retinoic acid-induced growth inhibition and differentiation of myeloid leukemia cells. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. ankyrin repeat and SOCS box containing 2 ASB2 ENSG00000100628
9423 Netrin is included in a family of laminin-related secreted proteins. The function of this gene has not yet been defined; however, netrin is thought to be involved in axon guidance and cell migration during development. Mutations and loss of expression of netrin suggest that variation in netrin may be involved in cancer development. netrin 1 NTN1 ENSG00000065320
1000 This gene encodes a classical cadherin and member of the cadherin superfamily. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein is proteolytically processed to generate a calcium-dependent cell adhesion molecule and glycoprotein. This protein plays a role in the establishment of left-right asymmetry, development of the nervous system and the formation of cartilage and bone. cadherin 2 CDH2 ENSG00000170558
4232 This gene encodes a member of the alpha/beta hydrolase superfamily. It is imprinted, exhibiting preferential expression from the paternal allele in fetal tissues, and isoform-specific imprinting in lymphocytes. The loss of imprinting of this gene has been linked to certain types of cancer and may be due to promotor switching. The encoded protein may play a role in development. Alternatively spliced transcript variants encoding multiple isoforms have been identified for this gene. Pseudogenes of this gene are located on the short arm of chromosomes 3 and 4, and the long arm of chromosomes 6 and 15. mesoderm specific transcript MEST ENSG00000106484
104326055 NA APOA1 antisense RNA APOA1-AS ENSG00000235910
22943 This gene encodes a protein that is a member of the dickkopf family. It is a secreted protein with two cysteine rich regions and is involved in embryonic development through its inhibition of the WNT signaling pathway. Elevated levels of DKK1 in bone marrow plasma and peripheral blood is associated with the presence of osteolytic bone lesions in patients with multiple myeloma. dickkopf WNT signaling pathway inhibitor 1 DKK1 ENSG00000107984
335 This gene encodes apolipoprotein A-I, which is the major protein component of high density lipoprotein (HDL) in plasma. The encoded preproprotein is proteolytically processed to generate the mature protein, which promotes cholesterol efflux from tissues to the liver for excretion, and is a cofactor for lecithin cholesterolacyltransferase (LCAT), an enzyme responsible for the formation of most plasma cholesteryl esters. This gene is closely linked with two other apolipoprotein genes on chromosome 11. Defects in this gene are associated with HDL deficiencies, including Tangier disease, and with systemic non-neuropathic amyloidosis. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein. apolipoprotein A1 APOA1 ENSG00000118137
57124 NA CD248 molecule CD248 ENSG00000174807
100885848 NA prostaglandin E synthase 3 (cytosolic)-like PTGES3L ENSG00000267060
147906 NA dishevelled binding antagonist of beta catenin 3 DACT3 ENSG00000197380
ENSG00000261054 NA NA RP11-6O2.4 ENSG00000261054
4897 Cell adhesion molecules (CAMs) are members of the immunoglobulin superfamily. This gene encodes a neuronal cell adhesion molecule with multiple immunoglobulin-like C2-type domains and fibronectin type-III domains. This ankyrin-binding protein is involved in neuron-neuron adhesion and promotes directional signaling during axonal cone growth. This gene is also expressed in non-neural tissues and may play a general role in cell-cell communication via signaling from its intracellular domain to the actin cytoskeleton during directional cell migration. Allelic variants of this gene have been associated with autism and addiction vulnerability. Alternative splicing results in multiple transcript variants encoding different isoforms. neuronal cell adhesion molecule NRCAM ENSG00000091129
5346 The protein encoded by this gene coats lipid storage droplets in adipocytes, thereby protecting them until they can be broken down by hormone-sensitive lipase. The encoded protein is the major cAMP-dependent protein kinase substrate in adipocytes and, when unphosphorylated, may play a role in the inhibition of lipolysis. Alternatively spliced transcript variants varying in the 5’ UTR, but encoding the same protein, have been found for this gene. perilipin 1 PLIN1 ENSG00000166819
117178 This gene encodes a protein that binds the cancer-testis antigen Synovial Sarcoma X breakpoint 2 protein. The encoded protein may regulate the activity of Synovial Sarcoma X breakpoint 2 protein in malignant cells. Alternate splicing results in multiple transcript variants. A pseudogene of this gene is found on chromosome 3. SSX family member 2 interacting protein SSX2IP ENSG00000117155
6591 This gene encodes a member of the Snail family of C2H2-type zinc finger transcription factors. The encoded protein acts as a transcriptional repressor that binds to E-box motifs and is also likely to repress E-cadherin transcription in breast carcinoma. This protein is involved in epithelial-mesenchymal transitions and has antiapoptotic activity. Mutations in this gene may be associated with sporatic cases of neural tube defects. snail family transcriptional repressor 2 SNAI2 ENSG00000019549
375061 NA family with sequence similarity 89 member A FAM89A ENSG00000182118
8736 The giant protein titin, together with its associated proteins, interconnects the major structure of sarcomeres, the M bands and Z discs. The C-terminal end of the titin string extends into the M line, where it binds tightly to M-band constituents of apparent molecular masses of 190 kD (myomesin 1) and 165 kD (myomesin 2). This protein, myomesin 1, like myomesin 2, titin, and other myofibrillar proteins contains structural modules with strong homology to either fibronectin type III (motif I) or immunoglobulin C2 (motif II) domains. Myomesin 1 and myomesin 2 each have a unique N-terminal region followed by 12 modules of motif I or motif II, in the arrangement II-II-I-I-I-I-I-II-II-II-II-II. The two proteins share 50% sequence identity in this repeat-containing region. The head structure formed by these 2 proteins on one end of the titin string extends into the center of the M band. The integrating structure of the sarcomere arises from muscle-specific members of the superfamily of immunoglobulin-like proteins. Alternatively spliced transcript variants encoding different isoforms have been identified. myomesin 1 MYOM1 ENSG00000101605
5881 The protein encoded by this gene is a GTPase which belongs to the RAS superfamily of small GTP-binding proteins. Members of this superfamily appear to regulate a diverse array of cellular events, including the control of cell growth, cytoskeletal reorganization, and the activation of protein kinases. Alternative splicing results in multiple transcript variants. ras-related C3 botulinum toxin substrate 3 (rho family, small GTP binding protein Rac3) RAC3 ENSG00000169750
ENSG00000231346 NA long intergenic non-protein coding RNA 1160 LINC01160 ENSG00000231346
100506826 NA MYLK antisense RNA 1 MYLK-AS1 ENSG00000239523
ENSG00000254756 NA NA RP11-867G23.12 ENSG00000254756
348093 NA RNA binding protein with multiple splicing 2 RBPMS2 ENSG00000166831
72 Actins are highly conserved proteins that are involved in various types of cell motility and in the maintenance of the cytoskeleton. Three types of actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exist in most cell types as components of the cytoskeleton and as mediators of internal cell motility. This gene encodes actin gamma 2; a smooth muscle actin found in enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms. Based on similarity to peptide cleavage of related actins, the mature protein of this gene is formed by removal of two N-terminal peptides. actin, gamma 2, smooth muscle, enteric ACTG2 ENSG00000163017
4828 This gene encodes a member of the bombesin-like family of neuropeptides, which negatively regulate eating behavior. The encoded protein may regulate colonic smooth muscle contraction through binding to its cognate receptor, the neuromedin B receptor (NMBR). Polymorphisms of this gene may be associated with hunger, weight gain and obesity. Alternative splicing results in multiple transcript variants. neuromedin B NMB ENSG00000197696
50486 NA G0/G1 switch 2 G0S2 ENSG00000123689
26136 Cancer-associated chromosomal changes often involve regions containing fragile sites. This gene maps to a commom fragile site on chromosome 7q31.2 designated FRA7G. This gene is similar to mouse Testin, a testosterone-responsive gene encoding a Sertoli cell secretory protein containing three LIM domains. LIM domains are double zinc-finger motifs that mediate protein-protein interactions between transcription factors, cytoskeletal proteins and signaling proteins. This protein is a negative regulator of cell growth and may act as a tumor suppressor. This scaffold protein may also play a role in cell adhesion, cell spreading and in the reorganization of the actin cytoskeleton. Multiple protein isoforms are encoded by transcript variants of this gene. testin LIM domain protein TES ENSG00000135269
2167 FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. fatty acid binding protein 4 FABP4 ENSG00000170323
6288 This gene encodes a member of the serum amyloid A family of apolipoproteins. The encoded preproprotein is proteolytically processed to generate the mature protein. This protein is a major acute phase protein that is highly expressed in response to inflammation and tissue injury. This protein also plays an important role in HDL metabolism and cholesterol homeostasis. High levels of this protein are associated with chronic inflammatory diseases including atherosclerosis, rheumatoid arthritis, Alzheimer’s disease and Crohn’s disease. This protein may also be a potential biomarker for certain tumors. Alternate splicing results in multiple transcript variants that encode the same protein. A pseudogene of this gene is found on chromosome 11. serum amyloid A1 SAA1 ENSG00000173432
27296 NA TP53 target 5 TP53TG5 ENSG00000124251
1601 This gene encodes a mitogen-responsive phosphoprotein. It is expressed in normal ovarian epithelial cells, but is down-regulated or absent from ovarian carcinoma cell lines, suggesting its role as a tumor suppressor. This protein binds to the SH3 domains of GRB2, an adaptor protein that couples tyrosine kinase receptors to SOS (a guanine nucleotide exchange factor for Ras), via its C-terminal proline-rich sequences, and may thus modulate growth factor/Ras pathways by competing with SOS for binding to GRB2. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. DAB2, clathrin adaptor protein DAB2 ENSG00000153071
100127983 NA chromosome 8 open reading frame 88 C8orf88 ENSG00000253250
9610 NA Ras and Rab interactor 1 RIN1 ENSG00000174791
58476 NA tumor protein p53 inducible nuclear protein 2 TP53INP2 ENSG00000078804
4638 This gene, a muscle member of the immunoglobulin gene superfamily, encodes myosin light chain kinase which is a calcium/calmodulin dependent enzyme. This kinase phosphorylates myosin regulatory light chains to facilitate myosin interaction with actin filaments to produce contractile activity. This gene encodes both smooth muscle and nonmuscle isoforms. In addition, using a separate promoter in an intron in the 3’ region, it encodes telokin, a small protein identical in sequence to the C-terminus of myosin light chain kinase, that is independently expressed in smooth muscle and functions to stabilize unphosphorylated myosin filaments. A pseudogene is located on the p arm of chromosome 3. Four transcript variants that produce four isoforms of the calcium/calmodulin dependent enzyme have been identified as well as two transcripts that produce two isoforms of telokin. Additional variants have been identified but lack full length transcripts. myosin light chain kinase MYLK ENSG00000065534
101929595 NA uncharacterized LOC101929595 LOC101929595 ENSG00000245293
1759 This gene encodes a member of the dynamin subfamily of GTP-binding proteins. The encoded protein possesses unique mechanochemical properties used to tubulate and sever membranes, and is involved in clathrin-mediated endocytosis and other vesicular trafficking processes. Actin and other cytoskeletal proteins act as binding partners for the encoded protein, which can also self-assemble leading to stimulation of GTPase activity. More than sixty highly conserved copies of the 3’ region of this gene are found elsewhere in the genome, particularly on chromosomes Y and 15. Alternatively spliced transcript variants encoding different isoforms have been described. dynamin 1 DNM1 ENSG00000106976
6324 Voltage-gated sodium channels are heteromeric proteins that function in the generation and propagation of action potentials in muscle and neuronal cells. They are composed of one alpha and two beta subunits, where the alpha subunit provides channel activity and the beta-1 subunit modulates the kinetics of channel inactivation. This gene encodes a sodium channel beta-1 subunit. Mutations in this gene result in generalized epilepsy with febrile seizures plus, Brugada syndrome 5, and defects in cardiac conduction. Multiple transcript variants encoding different isoforms have been found for this gene. sodium voltage-gated channel beta subunit 1 SCN1B ENSG00000105711
10544 The protein encoded by this gene is a receptor for activated protein C, a serine protease activated by and involved in the blood coagulation pathway. The encoded protein is an N-glycosylated type I membrane protein that enhances the activation of protein C. Mutations in this gene have been associated with venous thromboembolism and myocardial infarction, as well as with late fetal loss during pregnancy. The encoded protein may also play a role in malarial infection and has been associated with cancer. protein C receptor PROCR ENSG00000101000
7871 This gene encodes a component of a conserved striatin-interacting phosphatase and kinase complex. Striatin family complexes participate in a variety of cellular processes including signaling, cell cycle control, cell migration, Golgi assembly, and apoptosis. The protein encoded by this gene is a coiled-coil, tail-anchored membrane protein with a single C-terminal transmembrane domain that is posttranslationally inserted into membranes. Mutations in this gene are associated with Brugada syndrome, a cardiac channelopathy. Alternative splicing results in multiple transcript variants. sarcolemma associated protein SLMAP ENSG00000163681
23043 Germinal center kinases (GCKs), such as TNIK, are characterized by an N-terminal kinase domain and a C-terminal GCK domain that serves a regulatory function (Fu et al., 1999 [PubMed 10521462]). TRAF2 and NCK interacting kinase TNIK ENSG00000154310
101930114 NA uncharacterized LOC101930114 LOC101930114 ENSG00000227591
6623 This gene encodes a member of the synuclein family of proteins which are believed to be involved in the pathogenesis of neurodegenerative diseases. Mutations in this gene have also been associated with breast tumor development. synuclein gamma SNCG ENSG00000173267
79870 This gene was identified by gene expression studies in patients with acute myeloid leukemia (AML). The gene is conserved among mammals and is not found in lower organisms. Tissues that express this gene develop from the neuroectoderm. Multiple alternatively spliced transcript variants that encode different proteins have been described for this gene; however, some of the transcript variants are found only in AML cell lines. brain and acute leukemia, cytoplasmic BAALC ENSG00000164929
283807 This gene encodes a member of the F-box protein family. This F-box protein interacts with S-phase kinase-associated protein 1A and cullin in order to form SCF complexes which function as ubiquitin ligases. F-box and leucine rich repeat protein 22 FBXL22 ENSG00000197361
2901 This gene encodes a protein that belongs to the glutamate-gated ionic channel family. Glutamate functions as the major excitatory neurotransmitter in the central nervous system through activation of ligand-gated ion channels and G protein-coupled membrane receptors. The protein encoded by this gene forms functional heteromeric kainate-preferring ionic channels with the subunits encoded by related gene family members. Alternative splicing results in multiple transcript variants. glutamate ionotropic receptor kainate type subunit 5 GRIK5 ENSG00000105737
23336 The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene. synemin SYNM ENSG00000182253
7079 This gene belongs to the TIMP gene family. The proteins encoded by this gene family are inhibitors of the matrix metalloproteinases, a group of peptidases involved in degradation of the extracellular matrix. The secreted, netrin domain-containing protein encoded by this gene is involved in regulation of platelet aggregation and recruitment and may play role in hormonal regulation and endometrial tissue remodeling. TIMP metallopeptidase inhibitor 4 TIMP4 ENSG00000157150
646023 NA ADORA2A antisense RNA 1 ADORA2A-AS1 ENSG00000178803
100874032 NA PRRT3 antisense RNA 1 PRRT3-AS1 ENSG00000230082
1513 The protein encoded by this gene is a lysosomal cysteine proteinase involved in bone remodeling and resorption. This protein, which is a member of the peptidase C1 protein family, is predominantly expressed in osteoclasts. However, the encoded protein is also expressed in a significant fraction of human breast cancers, where it could contribute to tumor invasiveness. Mutations in this gene are the cause of pycnodysostosis, an autosomal recessive disease characterized by osteosclerosis and short stature. cathepsin K CTSK ENSG00000143387
51559 NA 5’-nucleotidase domain containing 3 NT5DC3 ENSG00000111696
6450 NA SH3 domain binding glutamate rich protein SH3BGR ENSG00000185437
4487 This gene encodes a member of the muscle segment homeobox gene family. The encoded protein functions as a transcriptional repressor during embryogenesis through interactions with components of the core transcription complex and other homeoproteins. It may also have roles in limb-pattern formation, craniofacial development, particularly odontogenesis, and tumor growth inhibition. Mutations in this gene, which was once known as homeobox 7, have been associated with nonsyndromic cleft lip with or without cleft palate 5, Witkop syndrome, Wolf-Hirschom syndrome, and autosomoal dominant hypodontia. msh homeobox 1 MSX1 ENSG00000163132
10669 NA cell growth regulator with EF-hand domain 1 CGREF1 ENSG00000138028
5157 This gene encodes a protein with significant sequence similarity to the ligand binding domain of platelet-derived growth factor receptor beta. Mutations in this gene, or deletion of a chromosomal segment containing this gene, are associated with sporadic hepatocellular carcinomas, colorectal cancers, and non-small cell lung cancers. This suggests this gene product may function as a tumor suppressor. platelet derived growth factor receptor like PDGFRL ENSG00000104213
81544 Glycerophosphodiester phosphodiesterases (GDPDs; EC 3.1.4.46), such as GDPD5, are involved in glycerol metabolism (Lang et al., 2008 [PubMed 17578682]). glycerophosphodiester phosphodiesterase domain containing 5 GDPD5 ENSG00000158555
55897 NA mesoderm posterior bHLH transcription factor 1 MESP1 ENSG00000166823
ENSG00000230289 NA NA RP11-334J6.6 ENSG00000230289
63924 This gene encodes a member of the cell death-inducing DNA fragmentation factor-like effector family. Members of this family play important roles in apoptosis. The encoded protein promotes lipid droplet formation in adipocytes and may mediate adipocyte apoptosis. This gene is regulated by insulin and its expression is positively correlated with insulin sensitivity. Mutations in this gene may contribute to insulin resistant diabetes. A pseudogene of this gene is located on the short arm of chromosome 3. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. cell death inducing DFFA like effector c CIDEC ENSG00000187288
7301 The gene is part of a 3-member transmembrane receptor kinase receptor family with a processed pseudogene distal on chromosome 15. The encoded protein is activated by the products of the growth arrest-specific gene 6 and protein S genes and is involved in controlling cell survival and proliferation, spermatogenesis, immunoregulation and phagocytosis. The encoded protein has also been identified as a cell entry factor for Ebola and Marburg viruses. TYRO3 protein tyrosine kinase TYRO3 ENSG00000092445
2791 This gene is a member of the guanine nucleotide-binding protein (G protein) gamma family and encodes a lipid-anchored, cell membrane protein. As a member of the heterotrimeric G protein complex, this protein plays a role in this transmembrane signaling system. This protein is also subject to carboxyl-terminal processing. Decreased expression of this gene is associated with splenic marginal zone lymphomas. G protein subunit gamma 11 GNG11 ENSG00000127920
257177 NA cilia and flagella associated protein 126 CFAP126 ENSG00000188931
7106 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. This encoded protein is a cell surface glycoprotein and is similar in sequence to its family member CD53 antigen. It is known to complex with integrins and other transmembrane 4 superfamily proteins. Alternatively spliced transcript variants encoding different isoforms have been identified. tetraspanin 4 TSPAN4 ENSG00000214063
3991 The protein encoded by this gene has a long and a short form, generated by use of alternative translational start codons. The long form is expressed in steroidogenic tissues such as testis, where it converts cholesteryl esters to free cholesterol for steroid hormone production. The short form is expressed in adipose tissue, among others, where it hydrolyzes stored triglycerides to free fatty acids. lipase E, hormone sensitive type LIPE ENSG00000079435
29109 This gene encodes a protein which is a member of the formin/diaphanous family of proteins. The gene is ubiquitously expressed but is found in abundance in the spleen. The encoded protein has sequence homology to diaphanous and formin proteins within the Formin Homology (FH)1 and FH2 domains. It also contains a coiled-coil domain, a collagen-like domain, two nuclear localization signals, and several potential PKC and PKA phosphorylation sites. It is a predominantly cytoplasmic protein and is expressed in a variety of human cell lines. Alternative splicing results in multiple transcript variants. formin homology 2 domain containing 1 FHOD1 ENSG00000135723
115908 This locus encodes a protein that may play a role in the cellular response to arterial injury through involvement in vascular remodeling. Mutations at this locus have been associated with Barrett esophagus and esophageal adenocarcinoma. Alternatively spliced transcript variants have been described. collagen triple helix repeat containing 1 CTHRC1 ENSG00000164932
2194 The enzyme encoded by this gene is a multifunctional protein. Its main function is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of FAS is fused in-frame with the C-terminus of ER-alpha. fatty acid synthase FASN ENSG00000169710
57449 This gene encodes a protein that activates the nuclear factor kappa B (NFKB1) signaling pathway. Mutations in this gene are associated with autosomal recessive distal spinal muscular atrophy. Multiple transcript variants encoding different isoforms have been found for this gene. pleckstrin homology and RhoGEF domain containing G5 PLEKHG5 ENSG00000171680
4660 Myosin phosphatase is a protein complex comprised of three subunits: a catalytic subunit (PP1c-delta, protein phosphatase 1, catalytic subunit delta), a large regulatory subunit (MYPT, myosin phosphatase target) and small regulatory subunit (sm-M20). Two isoforms of MYPT have been isolated–MYPT1 and MYPT2, the first of which is widely expressed, and the second of which may be specific to heart, skeletal muscle, and brain. Each of the MYPT isoforms functions to bind PP1c-delta and increase phosphatase activity. This locus encodes both MYTP2 and M20. Alternatively spliced transcript variants encoding different isoforms have been identified. Related pseudogenes have been defined on the Y chromosome. protein phosphatase 1 regulatory subunit 12B PPP1R12B ENSG00000077157
5118 Fibrillar collagen types I-III are synthesized as precursor molecules known as procollagens. These precursors contain amino- and carboxyl-terminal peptide extensions known as N- and C-propeptides, respectively, which are cleaved, upon secretion of procollagen from the cell, to yield the mature triple helical, highly structured fibrils. This gene encodes a glycoprotein which binds and drives the enzymatic cleavage of type I procollagen and heightens C-proteinase activity. procollagen C-endopeptidase enhancer PCOLCE ENSG00000106333
3306 NA heat shock protein family A (Hsp70) member 2 HSPA2 ENSG00000126803
5350 The protein encoded by this gene is found as a pentamer and is a major substrate for the cAMP-dependent protein kinase in cardiac muscle. The encoded protein is an inhibitor of cardiac muscle sarcoplasmic reticulum Ca(2+)-ATPase in the unphosphorylated state, but inhibition is relieved upon phosphorylation of the protein. The subsequent activation of the Ca(2+) pump leads to enhanced muscle relaxation rates, thereby contributing to the inotropic response elicited in heart by beta-agonists. The encoded protein is a key regulator of cardiac diastolic function. Mutations in this gene are a cause of inherited human dilated cardiomyopathy with refractory congestive heart failure, and also familial hypertrophic cardiomyopathy. phospholamban PLN ENSG00000198523
9454 This gene encodes a member of the HOMER family of postsynaptic density scaffolding proteins that share a similar domain structure consisting of an N-terminal Enabled/vasodilator-stimulated phosphoprotein homology 1 domain which mediates protein-protein interactions, and a carboxy-terminal coiled-coil domain and two leucine zipper motifs that are involved in self-oligomerization. The encoded protein binds numerous other proteins including group I metabotropic glutamate receptors, inositol 1,4,5-trisphosphate receptors and amyloid precursor proteins and has been implicated in diverse biological functions such as neuronal signaling, T-cell activation and trafficking of amyloid beta peptides. Alternative splicing results in multiple transcript variants. homer scaffolding protein 3 HOMER3 ENSG00000051128
2318 This gene encodes one of three related filamin genes, specifically gamma filamin. These filamin proteins crosslink actin filaments into orthogonal networks in cortical cytoplasm and participate in the anchoring of membrane proteins for the actin cytoskeleton. Three functional domains exist in filamin: an N-terminal filamentous actin-binding domain, a C-terminal self-association domain, and a membrane glycoprotein-binding domain. Two transcript variants encoding different isoforms have been found for this gene. filamin C FLNC ENSG00000128591
ENSG00000229894 NA NA RP11-668G10.2 ENSG00000229894
245711 NA speedy/RINGO cell cycle regulator family member A SPDYA ENSG00000163806
10267 The protein encoded by this gene is a member of the RAMP family of single-transmembrane-domain proteins, called receptor (calcitonin) activity modifying proteins (RAMPs). RAMPs are type I transmembrane proteins with an extracellular N terminus and a cytoplasmic C terminus. RAMPs are required to transport calcitonin-receptor-like receptor (CRLR) to the plasma membrane. CRLR, a receptor with seven transmembrane domains, can function as either a calcitonin-gene-related peptide (CGRP) receptor or an adrenomedullin receptor, depending on which members of the RAMP family are expressed. In the presence of this (RAMP1) protein, CRLR functions as a CGRP receptor. The RAMP1 protein is involved in the terminal glycosylation, maturation, and presentation of the CGRP receptor to the cell surface. Alternative splicing results in multiple transcript variants encoding different isoforms. receptor activity modifying protein 1 RAMP1 ENSG00000132329
151887 NA coiled-coil domain containing 80 CCDC80 ENSG00000091986
9501 The protein encoded by this gene plays a direct regulatory role in calcium-ion-dependent exocytosis in both endocrine and exocrine cells and plays a key role in insulin secretion by pancreatic cells. This gene is likely a tumor suppressor. Alternative splicing results in multiple transcript variants encoding distinct isoforms. rabphilin 3A-like (without C2 domains) RPH3AL ENSG00000181031
1846 The protein encoded by this gene is a member of the dual specificity protein phosphatase subfamily. These phosphatases inactivate their target kinases by dephosphorylating both the phosphoserine/threonine and phosphotyrosine residues. They negatively regulate members of the mitogen-activated protein (MAP) kinase superfamily (MAPK/ERK, SAPK/JNK, p38), which are associated with cellular proliferation and differentiation. Different members of the family of dual specificity phosphatases show distinct substrate specificities for various MAP kinases, different tissue distribution and subcellular localization, and different modes of inducibility of their expression by extracellular stimuli. This gene product inactivates ERK1, ERK2 and JNK, is expressed in a variety of tissues, and is localized in the nucleus. Two alternatively spliced transcript variants, encoding distinct isoforms, have been observed for this gene. In addition, multiple polyadenylation sites have been reported. dual specificity phosphatase 4 DUSP4 ENSG00000120875
4192 This gene encodes a member of a small family of secreted growth factors that binds heparin and responds to retinoic acid. The encoded protein promotes cell growth, migration, and angiogenesis, in particular during tumorigenesis. This gene has been targeted as a therapeutic for a variety of different disorders. Alternatively spliced transcript variants encoding multiple isoforms have been observed. midkine (neurite growth-promoting factor 2) MDK ENSG00000110492
205 This gene encodes a member of the adenylate kinase family of enzymes. The encoded protein is localized to the mitochondrial matrix. Adenylate kinases regulate the adenine and guanine nucleotide compositions within a cell by catalyzing the reversible transfer of phosphate group among these nucleotides. Five isozymes of adenylate kinase have been identified in vertebrates. Expression of these isozymes is tissue-specific and developmentally regulated. A pseudogene for this gene has been located on chromosome 17. Three transcript variants encoding the same protein have been identified for this gene. Sequence alignment suggests that the gene defined by NM_013410, NM_203464, and NM_001005353 is located on chromosome 1. adenylate kinase 4 AK4 ENSG00000162433
284358 NA MEF2 activating motif and SAP domain containing transcriptional regulator MAMSTR ENSG00000176909
5468 This gene encodes a member of the peroxisome proliferator-activated receptor (PPAR) subfamily of nuclear receptors. PPARs form heterodimers with retinoid X receptors (RXRs) and these heterodimers regulate transcription of various genes. Three subtypes of PPARs are known: PPAR-alpha, PPAR-delta, and PPAR-gamma. The protein encoded by this gene is PPAR-gamma and is a regulator of adipocyte differentiation. Additionally, PPAR-gamma has been implicated in the pathology of numerous diseases including obesity, diabetes, atherosclerosis and cancer. Alternatively spliced transcript variants that encode different isoforms have been described. peroxisome proliferator activated receptor gamma PPARG ENSG00000132170
8165 The A-kinase anchor proteins (AKAPs) are a group of structurally diverse proteins, which have the common function of binding to the regulatory subunit of protein kinase A (PKA) and confining the holoenzyme to discrete locations within the cell. This gene encodes a member of the AKAP family. The encoded protein binds to type I and type II regulatory subunits of PKA and anchors them to the mitochondrion. This protein is speculated to be involved in the cAMP-dependent signal transduction pathway and in directing RNA to a specific cellular compartment. A-kinase anchoring protein 1 AKAP1 ENSG00000121057
ENSG00000231050 NA NA RP1-140A9.1 ENSG00000231050
11149 This gene encodes a member of the POP family of proteins containing three putative transmembrane domains. This gene is expressed in cardiac and skeletal muscle and may play an important role in development of these tissues. The mouse ortholog may be involved in the regeneration of adult skeletal muscle and may act as a cell adhesion molecule in coronary vasculogenesis. Three transcript variants encoding the same protein have been found for this gene. blood vessel epicardial substance BVES ENSG00000112276
5802 The protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family. PTPs are known to be signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation. This PTP contains an extracellular region, a single transmembrane segment and two tandem intracytoplasmic catalytic domains, and thus represents a receptor-type PTP. The extracellular region of this protein is composed of multiple Ig-like and fibronectin type III-like domains. Studies of the similar gene in mice suggested that this PTP may be involved in cell-cell interaction, primary axonogenesis, and axon guidance during embryogenesis. This PTP has been also implicated in the molecular control of adult nerve repair. Four alternatively spliced transcript variants, which encode distinct proteins, have been reported. protein tyrosine phosphatase, receptor type S PTPRS ENSG00000105426
29108 This gene encodes an adaptor protein that is composed of two protein-protein interaction domains: a N-terminal PYRIN-PAAD-DAPIN domain (PYD) and a C-terminal caspase-recruitment domain (CARD). The PYD and CARD domains are members of the six-helix bundle death domain-fold superfamily that mediates assembly of large signaling complexes in the inflammatory and apoptotic signaling pathways via the activation of caspase. In normal cells, this protein is localized to the cytoplasm; however, in cells undergoing apoptosis, it forms ball-like aggregates near the nuclear periphery. Two transcript variants encoding different isoforms have been found for this gene. PYD and CARD domain containing PYCARD ENSG00000103490
125058 NA TBC1 domain family member 16 TBC1D16 ENSG00000167291
ENSG00000243829 NA NA CTB-33G10.1 ENSG00000243829
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",10,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 11 Annotations

out <- mygene::queryMany(gene_list[11,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name query symbol X_id summary notfound
serum amyloid A2 ENSG00000134339 SAA2 6289 NA NA
SAA2-SAA4 readthrough ENSG00000255071 SAA2-SAA4 100528017 This locus represents naturally occurring read-through transcription between the neighboring serum amyloid A2 and serum amyloid A4 genes on chromosome 11. The read-through transcript produces a fusion protein that shares sequence identity with each individual gene product. NA
ankyrin 1 ENSG00000029534 ANK1 286 Ankyrins are a family of proteins that link the integral membrane proteins to the underlying spectrin-actin cytoskeleton and play key roles in activities such as cell motility, activation, proliferation, contact and the maintenance of specialized membrane domains. Multiple isoforms of ankyrin with different affinities for various target proteins are expressed in a tissue-specific, developmentally regulated manner. Most ankyrins are typically composed of three structural domains: an amino-terminal domain containing multiple ankyrin repeats; a central region with a highly conserved spectrin binding domain; and a carboxy-terminal regulatory domain which is the least conserved and subject to variation. Ankyrin 1, the prototype of this family, was first discovered in the erythrocytes, but since has also been found in brain and muscles. Mutations in erythrocytic ankyrin 1 have been associated in approximately half of all patients with hereditary spherocytosis. Complex patterns of alternative splicing in the regulatory domain, giving rise to different isoforms of ankyrin 1 have been described. Truncated muscle-specific isoforms of ankyrin 1 resulting from usage of an alternate promoter have also been identified. NA
trophoblast glycoprotein ENSG00000146242 TPBG 7162 This gene encodes a leucine-rich transmembrane glycoprotein that may be involved in cell adhesion. The encoded protein is an oncofetal antigen that is specific to trophoblast cells. In adults this protein is highly expressed in many tumor cells and is associated with poor clinical outcome in numerous cancers. Alternate splicing in the 5’ UTR results in multiple transcript variants that encode the same protein. NA
glutathione S-transferase alpha 1 ENSG00000243955 GSTA1 2938 This gene encodes a member of a family of enzymes that function to add glutathione to target electrophilic compounds, including carcinogens, therapeutic drugs, environmental toxins, and products of oxidative stress. This action is an important step in detoxification of these compounds. This subfamily of enzymes has a particular role in protecting cells from reactive oxygen species and the products of peroxidation. Polymorphisms in this gene influence the ability of individuals to metabolize different drugs. This gene is located in a cluster of similar genes and pseudogenes on chromosome 6. Alternative splicing results in multiple transcript variants. NA
serum amyloid A1 ENSG00000173432 SAA1 6288 This gene encodes a member of the serum amyloid A family of apolipoproteins. The encoded preproprotein is proteolytically processed to generate the mature protein. This protein is a major acute phase protein that is highly expressed in response to inflammation and tissue injury. This protein also plays an important role in HDL metabolism and cholesterol homeostasis. High levels of this protein are associated with chronic inflammatory diseases including atherosclerosis, rheumatoid arthritis, Alzheimer’s disease and Crohn’s disease. This protein may also be a potential biomarker for certain tumors. Alternate splicing results in multiple transcript variants that encode the same protein. A pseudogene of this gene is found on chromosome 11. NA
tubulin beta 6 class V ENSG00000176014 TUBB6 84617 NA NA
NOTCH1 associated lncRNA in T-cell acute lymphoblastic leukemia 1 ENSG00000237886 NALT1 ENSG00000237886 NA NA
vitronectin ENSG00000109072 VTN 7448 The protein encoded by this gene is a member of the pexin family. It is found in serum and tissues and promotes cell adhesion and spreading, inhibits the membrane-damaging effect of the terminal cytolytic complement pathway, and binds to several serpin serine protease inhibitors. It is a secreted protein and exists in either a single chain form or a clipped, two chain form held together by a disulfide bond. NA
NA ENSG00000270670 RP11-248C1.3 ENSG00000270670 NA NA
NA ENSG00000242198 CTD-2235C13.1 ENSG00000242198 NA NA
NA ENSG00000251196 RP11-54F2.1 ENSG00000251196 NA NA
neuronal calcium sensor 1 ENSG00000107130 NCS1 23413 This gene is a member of the neuronal calcium sensor gene family, which encode calcium-binding proteins expressed predominantly in neurons. The protein encoded by this gene regulates G protein-coupled receptor phosphorylation in a calcium-dependent manner and can substitute for calmodulin. The protein is associated with secretory granules and modulates synaptic transmission and synaptic plasticity. Multiple transcript variants encoding different isoforms have been found for this gene. NA
NA ENSG00000233593 RP4-665J23.1 ENSG00000233593 NA NA
cofilin 1 (non-muscle) pseudogene 5 ENSG00000213830 CFL1P5 ENSG00000213830 NA NA
ribosomal protein L5 pseudogene 23 ENSG00000240395 RPL5P23 ENSG00000240395 NA NA
epithelial cell adhesion molecule ENSG00000119888 EPCAM 4072 This gene encodes a carcinoma-associated antigen and is a member of a family that includes at least two type I membrane proteins. This antigen is expressed on most normal epithelial cells and gastrointestinal carcinomas and functions as a homotypic calcium-independent cell adhesion molecule. The antigen is being used as a target for immunotherapy treatment of human carcinomas. Mutations in this gene result in congenital tufting enteropathy. NA
potassium calcium-activated channel subfamily M alpha 1 ENSG00000156113 KCNMA1 3778 MaxiK channels are large conductance, voltage and calcium-sensitive potassium channels which are fundamental to the control of smooth muscle tone and neuronal excitability. MaxiK channels can be formed by 2 subunits: the pore-forming alpha subunit, which is the product of this gene, and the modulatory beta subunit. Intracellular calcium regulates the physical association between the alpha and beta subunits. Alternatively spliced transcript variants encoding different isoforms have been identified. NA
NA ENSG00000253364 RP11-731F5.2 ENSG00000253364 NA NA
erythrocyte membrane protein band 4.1 ENSG00000159023 EPB41 2035 The protein encoded by this gene, together with spectrin and actin, constitute the red cell membrane cytoskeletal network. This complex plays a critical role in erythrocyte shape and deformability. Mutations in this gene are associated with type 1 elliptocytosis (EL1). Alternatively spliced transcript variants encoding different isoforms have been described for this gene. NA
NA ENSG00000180672 NA NA NA TRUE
calsequestrin 2 ENSG00000118729 CASQ2 845 The protein encoded by this gene specifies the cardiac muscle family member of the calsequestrin family. Calsequestrin is localized to the sarcoplasmic reticulum in cardiac and slow skeletal muscle cells. The protein is a calcium binding protein that stores calcium for muscle function. Mutations in this gene cause stress-induced polymorphic ventricular tachycardia, also referred to as catecholaminergic polymorphic ventricular tachycardia 2 (CPVT2), a disease characterized by bidirectional ventricular tachycardia that may lead to cardiac arrest. NA
apolipoprotein A2 ENSG00000158874 APOA2 336 This gene encodes apolipoprotein (apo-) A-II, which is the second most abundant protein of the high density lipoprotein particles. The protein is found in plasma as a monomer, homodimer, or heterodimer with apolipoprotein D. Defects in this gene may result in apolipoprotein A-II deficiency or hypercholesterolemia. NA
transmembrane protein 54 ENSG00000121900 TMEM54 113452 NA NA
phosphatidylinositol glycan anchor biosynthesis class H pseudogene 1 ENSG00000259657 PIGHP1 ENSG00000259657 NA NA
NA ENSG00000234638 AC053503.6 ENSG00000234638 NA NA
cell death inducing DFFA like effector c ENSG00000187288 CIDEC 63924 This gene encodes a member of the cell death-inducing DNA fragmentation factor-like effector family. Members of this family play important roles in apoptosis. The encoded protein promotes lipid droplet formation in adipocytes and may mediate adipocyte apoptosis. This gene is regulated by insulin and its expression is positively correlated with insulin sensitivity. Mutations in this gene may contribute to insulin resistant diabetes. A pseudogene of this gene is located on the short arm of chromosome 3. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. NA
NA ENSG00000255139 AP000442.1 ENSG00000255139 NA NA
caveolin 1 ENSG00000105974 CAV1 857 The scaffolding protein encoded by this gene is the main component of the caveolae plasma membranes found in most cell types. The protein links integrin subunits to the tyrosine kinase FYN, an initiating step in coupling integrins to the Ras-ERK pathway and promoting cell cycle progression. The gene is a tumor suppressor gene candidate and a negative regulator of the Ras-p42/44 mitogen-activated kinase cascade. Caveolin 1 and caveolin 2 are located next to each other on chromosome 7 and express colocalizing proteins that form a stable hetero-oligomeric complex. Mutations in this gene have been associated with Berardinelli-Seip congenital lipodystrophy. Alternatively spliced transcripts encode alpha and beta isoforms of caveolin 1. NA
long intergenic non-protein coding RNA 865 ENSG00000232229 LINC00865 643529 NA NA
NA ENSG00000224818 RP11-134G8.10 ENSG00000224818 NA NA
sulfotransferase family 1A member 2 ENSG00000197165 SULT1A2 6799 Sulfotransferase enzymes catalyze the sulfate conjugation of many hormones, neurotransmitters, drugs, and xenobiotic compounds. These cytosolic enzymes are different in their tissue distributions and substrate specificities. The gene structure (number and length of exons) is similar among family members. This gene encodes one of two phenol sulfotransferases with thermostable enzyme activity. Two alternatively spliced variants that encode the same protein have been described. NA
B-cell translocation gene 1, anti-proliferative ENSG00000133639 BTG1 694 This gene is a member of an anti-proliferative gene family that regulates cell growth and differentiation. Expression of this gene is highest in the G0/G1 phases of the cell cycle and downregulated when cells progressed through G1. The encoded protein interacts with several nuclear receptors, and functions as a coactivator of cell differentiation. This locus has been shown to be involved in a t(8;12)(q24;q22) chromosomal translocation in a case of B-cell chronic lymphocytic leukemia. NA
TNF alpha induced protein 8 like 3 ENSG00000183578 TNFAIP8L3 388121 NA NA
nocturnin ENSG00000151014 NOCT 25819 The protein encoded by this gene is highly similar to Nocturnin, a gene identified as a circadian clock regulated gene in Xenopus laevis. This protein and Nocturnin protein share similarity with the C-terminal domain of a yeast transcription factor, carbon catabolite repression 4 (CCR4). The mRNA abundance of a similar gene in mouse has been shown to exhibit circadian rhythmicity, which suggests a role for this protein in clock function or as a circadian clock effector. NA
NA ENSG00000261337 NA NA NA TRUE
ribokinase ENSG00000171174 RBKS 64080 This gene encodes a member of the carbohydrate kinase PfkB family. The encoded protein phosphorylates ribose to form ribose-5-phosphate in the presence of ATP and magnesium as a first step in ribose metabolism. Alternative splicing results in multiple transcript variants. NA
NA ENSG00000236234 AC091132.1 ENSG00000236234 NA NA
NA ENSG00000236213 AC006369.2 ENSG00000236213 NA NA
interferon induced transmembrane protein 10 ENSG00000244242 IFITM10 402778 NA NA
granzyme M ENSG00000197540 GZMM 3004 Human natural killer (NK) cells and activated lymphocytes express and store a distinct subset of neutral serine proteases together with proteoglycans and other immune effector molecules in large cytoplasmic granules. These serine proteases are collectively termed granzymes and include 4 distinct gene products: granzyme A, granzyme B, granzyme H, and the protein encoded by this gene, granzyme M. Two transcript variants encoding different isoforms have been found for this gene. NA
nucleophosmin 1 (nucleolar phosphoprotein B23, numatrin) pseudogene 37 ENSG00000219085 NPM1P37 ENSG00000219085 NA NA
desmin ENSG00000175084 DES 1674 This gene encodes a muscle-specific class III intermediate filament. Homopolymers of this protein form a stable intracytoplasmic filamentous network connecting myofibrils to each other and to the plasma membrane. Mutations in this gene are associated with desmin-related myopathy, a familial cardiac and skeletal myopathy (CSM), and with distal myopathies. NA
fatty acid binding protein 4 ENSG00000170323 FABP4 2167 FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. NA
long intergenic non-protein coding RNA 1160 ENSG00000231346 LINC01160 ENSG00000231346 NA NA
NA ENSG00000261136 RP11-37C7.3 ENSG00000261136 NA NA
ribosomal protein S20 pseudogene 22 ENSG00000239218 RPS20P22 ENSG00000239218 NA NA
mesenteric estrogen dependent adipogenesis ENSG00000102802 MEDAG 84935 NA NA
shisa family member 3 ENSG00000178343 SHISA3 152573 NA NA
RNA, 7SL, cytoplasmic 608, pseudogene ENSG00000239884 RN7SL608P ENSG00000239884 NA NA
protein phosphatase, Mg2+/Mn2+ dependent 1H ENSG00000111110 PPM1H 57460 NA NA
syntaxin binding protein 6 ENSG00000168952 STXBP6 29091 STXBP6 binds components of the SNARE complex (see MIM 603215) and may be involved in regulating SNARE complex formation (Scales et al., 2002 [PubMed 12145319]). NA
NA ENSG00000229512 AC068580.5 ENSG00000229512 NA NA
insulin induced gene 1 ENSG00000186480 INSIG1 3638 Oxysterols regulate cholesterol homeostasis through the liver X receptor (LXR)- and sterol regulatory element-binding protein (SREBP)-mediated signaling pathways. This gene is an insulin-induced gene. It encodes an endoplasmic reticulum (ER) membrane protein that plays a critical role in regulating cholesterol concentrations in cells. This protein binds to the sterol-sensing domains of SREBP cleavage-activating protein (SCAP) and HMG CoA reductase, and is essential for the sterol-mediated trafficking of the two proteins. Alternatively spliced transcript variants encoding distinct isoforms have been observed. NA
activin A receptor like type 1 ENSG00000139567 ACVRL1 94 This gene encodes a type I cell-surface receptor for the TGF-beta superfamily of ligands. It shares with other type I receptors a high degree of similarity in serine-threonine kinase subdomains, a glycine- and serine-rich region (called the GS domain) preceding the kinase domain, and a short C-terminal tail. The encoded protein, sometimes termed ALK1, shares similar domain structures with other closely related ALK or activin receptor-like kinase proteins that form a subfamily of receptor serine/threonine kinases. Mutations in this gene are associated with hemorrhagic telangiectasia type 2, also known as Rendu-Osler-Weber syndrome 2. NA
NA ENSG00000234329 RP11-767N6.2 ENSG00000234329 NA NA
RAB36, member RAS oncogene family ENSG00000100228 RAB36 9609 NA NA
activated leukocyte cell adhesion molecule ENSG00000170017 ALCAM 214 This gene encodes activated leukocyte cell adhesion molecule (ALCAM), also known as CD166 (cluster of differentiation 166), which is a member of a subfamily of immunoglobulin receptors with five immunoglobulin-like domains (VVC2C2C2) in the extracellular domain. This protein binds to T-cell differentiation antigene CD6, and is implicated in the processes of cell adhesion and migration. Multiple alternatively spliced transcript variants encoding different isoforms have been found. NA
repulsive guidance molecule family member a ENSG00000182175 RGMA 56963 This gene encodes a member of the repulsive guidance molecule family. The encoded protein is a glycosylphosphatidylinositol-anchored glycoprotein that functions as an axon guidance protein in the developing and adult central nervous system. This protein may also function as a tumor suppressor in some cancers. Alternate splicing results in multiple transcript variants. NA
phospholipase A2 group IVB ENSG00000243708 PLA2G4B ENSG00000243708 NA NA
myosin, heavy chain 7, cardiac muscle, beta ENSG00000092054 MYH7 4625 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy. NA
prolactin ENSG00000172179 PRL 5617 This gene encodes the anterior pituitary hormone prolactin. This secreted hormone is a growth regulator for many tissues, including cells of the immune system. It may also play a role in cell survival by suppressing apoptosis, and it is essential for lactation. Alternative splicing results in multiple transcript variants that encode the same protein. NA
hydroxysteroid 17-beta dehydrogenase 6 ENSG00000025423 HSD17B6 8630 The protein encoded by this gene has both oxidoreductase and epimerase activities and is involved in androgen catabolism. The oxidoreductase activity can convert 3 alpha-adiol to dihydrotestosterone, while the epimerase activity can convert androsterone to epi-androsterone. Both reactions use NAD+ as the preferred cofactor. This gene is a member of the retinol dehydrogenase family. NA
NA ENSG00000266498 RP11-45M22.5 ENSG00000266498 NA NA
atypical chemokine receptor 3 ENSG00000144476 ACKR3 57007 This gene encodes a member of the G-protein coupled receptor family. Although this protein was earlier thought to be a receptor for vasoactive intestinal peptide (VIP), it is now considered to be an orphan receptor, in that its endogenous ligand has not been identified. The protein is also a coreceptor for human immunodeficiency viruses (HIV). Translocations involving this gene and HMGA2 on chromosome 12 have been observed in lipomas. NA
acyl-CoA thioesterase 4 ENSG00000177465 ACOT4 122970 NA NA
paternally expressed 3 ENSG00000198300 PEG3 5178 In human, ZIM2 and PEG3 are treated as two distinct genes though they share multiple 5’ exons and a common promoter and both genes are paternally expressed (PMID:15203203). Alternative splicing events connect their shared 5’ exons either with the remaining 4 exons unique to ZIM2, or with the remaining 2 exons unique to PEG3. In contrast, in other mammals ZIM2 does not undergo imprinting and, in mouse, cow, and likely other mammals as well, the ZIM2 and PEG3 genes do not share exons. Human PEG3 protein belongs to the Kruppel C2H2-type zinc finger protein family. PEG3 may play a role in cell proliferation and p53-mediated apoptosis. PEG3 has also shown tumor suppressor activity and tumorigenesis in glioma and ovarian cells. Alternative splicing of this PEG3 gene results in multiple transcript variants encoding distinct isoforms. NA
NA ENSG00000261759 RP11-626G11.3 ENSG00000261759 NA NA
hexokinase 3 ENSG00000160883 HK3 3101 Hexokinases phosphorylate glucose to produce glucose-6-phosphate, the first step in most glucose metabolism pathways. This gene encodes hexokinase 3. Similar to hexokinases 1 and 2, this allosteric enzyme is inhibited by its product glucose-6-phosphate. NA
zinc finger FYVE-type containing 28 ENSG00000159733 ZFYVE28 57732 NA NA
cytokine receptor like factor 1 ENSG00000006016 CRLF1 9244 This gene encodes a member of the cytokine type I receptor family. The protein forms a secreted complex with cardiotrophin-like cytokine factor 1 and acts on cells expressing ciliary neurotrophic factor receptors. The complex can promote survival of neuronal cells. Mutations in this gene result in Crisponi syndrome and cold-induced sweating syndrome. NA
testin LIM domain protein ENSG00000135269 TES 26136 Cancer-associated chromosomal changes often involve regions containing fragile sites. This gene maps to a commom fragile site on chromosome 7q31.2 designated FRA7G. This gene is similar to mouse Testin, a testosterone-responsive gene encoding a Sertoli cell secretory protein containing three LIM domains. LIM domains are double zinc-finger motifs that mediate protein-protein interactions between transcription factors, cytoskeletal proteins and signaling proteins. This protein is a negative regulator of cell growth and may act as a tumor suppressor. This scaffold protein may also play a role in cell adhesion, cell spreading and in the reorganization of the actin cytoskeleton. Multiple protein isoforms are encoded by transcript variants of this gene. NA
peptide YY, 2 (pseudogene) ENSG00000237575 PYY2 23615 NA NA
basic leucine zipper ATF-like transcription factor ENSG00000156127 BATF 10538 The protein encoded by this gene is a nuclear basic leucine zipper protein that belongs to the AP-1/ATF superfamily of transcription factors. The leucine zipper of this protein mediates dimerization with members of the Jun family of proteins. This protein is thought to be a negative regulator of AP-1/ATF transcriptional events. NA
nephrocystin 1 ENSG00000144061 NPHP1 4867 This gene encodes a protein with src homology domain 3 (SH3) patterns. This protein interacts with Crk-associated substrate, and it appears to function in the control of cell division, as well as in cell-cell and cell-matrix adhesion signaling, likely as part of a multifunctional complex localized in actin- and microtubule-based structures. Mutations in this gene cause familial juvenile nephronophthisis type 1, a kidney disorder involving both tubules and glomeruli. Defects in this gene are also associated with Senior-Loken syndrome type 1, also referred to as juvenile nephronophthisis with Leber amaurosis, which is characterized by kidney and eye disease, and with Joubert syndrome type 4, which is characterized by cerebellar ataxia, oculomotor apraxia, psychomotor delay and neonatal breathing abnormalities, sometimes including retinal dystrophy and renal disease. Multiple transcript variants encoding different isoforms have been found for this gene. NA
cysteine and glycine rich protein 3 ENSG00000129170 CSRP3 8048 This gene encodes a member of the CSRP family of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this protein is found in a group of proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Mutations in this gene are thought to cause heritable forms of hypertrophic cardiomyopathy (HCM) and dilated cardiomyopathy (DCM) in humans. Alternatively spliced transcript variants with different 5’ UTR, but encoding the same protein, have been found for this gene. NA
ADP ribosylation factor like GTPase 4D ENSG00000175906 ARL4D 379 ADP-ribosylation factor 4D is a member of the ADP-ribosylation factor family of GTP-binding proteins. ARL4D is closely similar to ARL4A and ARL4C and each has a nuclear localization signal and an unusually high guanine nucleotide exchange rate. This protein may play a role in membrane-associated intracellular trafficking. Mutations in this gene have been associated with Bardet-Biedl syndrome (BBS). NA
NA ENSG00000273018 CTD-2303H24.2 ENSG00000273018 NA NA
leucine rich repeats and immunoglobulin like domains 3 ENSG00000139263 LRIG3 121227 NA NA
low density lipoprotein receptor ENSG00000130164 LDLR 3949 The low density lipoprotein receptor (LDLR) gene family consists of cell surface proteins involved in receptor-mediated endocytosis of specific ligands. Low density lipoprotein (LDL) is normally bound at the cell membrane and taken into the cell ending up in lysosomes where the protein is degraded and the cholesterol is made available for repression of microsomal enzyme 3-hydroxy-3-methylglutaryl coenzyme A (HMG CoA) reductase, the rate-limiting step in cholesterol synthesis. At the same time, a reciprocal stimulation of cholesterol ester synthesis takes place. Mutations in this gene cause the autosomal dominant disorder, familial hypercholesterolemia. Alternate splicing results in multiple transcript variants. NA
copine 5 ENSG00000124772 CPNE5 57699 Calcium-dependent membrane-binding proteins may regulate molecular events at the interface of the cell membrane and cytoplasm. This gene is one of several genes that encode a calcium-dependent protein containing two N-terminal type II C2 domains and an integrin A domain-like sequence in the C-terminus. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene. More variants may exist, but their full-length natures could not be determined. NA
macrophage scavenger receptor 1 ENSG00000038945 MSR1 4481 This gene encodes the class A macrophage scavenger receptors, which include three different types (1, 2, 3) generated by alternative splicing of this gene. These receptors or isoforms are macrophage-specific trimeric integral membrane glycoproteins and have been implicated in many macrophage-associated physiological and pathological processes including atherosclerosis, Alzheimer’s disease, and host defense. The isoforms type 1 and type 2 are functional receptors and are able to mediate the endocytosis of modified low density lipoproteins (LDLs). The isoform type 3 does not internalize modified LDL (acetyl-LDL) despite having the domain shown to mediate this function in the types 1 and 2 isoforms. It has an altered intracellular processing and is trapped within the endoplasmic reticulum, making it unable to perform endocytosis. The isoform type 3 can inhibit the function of isoforms type 1 and type 2 when co-expressed, indicating a dominant negative effect and suggesting a mechanism for regulation of scavenger receptor activity in macrophages. NA
NA ENSG00000264924 RP11-799B12.2 ENSG00000264924 NA NA
complement component 4B (Chido blood group) ENSG00000224389 C4B 721 This gene encodes the basic form of complement factor 4, part of the classical activation pathway. The protein is expressed as a single chain precursor which is proteolytically cleaved into a trimer of alpha, beta, and gamma chains prior to secretion. The trimer provides a surface for interaction between the antigen-antibody complex and other complement components. The alpha chain may be cleaved to release C4 anaphylatoxin, a mediator of local inflammation. Deficiency of this protein is associated with systemic lupus erythematosus. This gene localizes to the major histocompatibility complex (MHC) class III region on chromosome 6. Varying haplotypes of this gene cluster exist, such that individuals may have 1, 2, or 3 copies of this gene. In addition, this gene exists as a long form and a short form due to the presence or absence of a 6.4 kb endogenous HERV-K retrovirus in intron 9. NA
chromogranin A ENSG00000100604 CHGA 1113 The protein encoded by this gene is a member of the chromogranin/secretogranin family of neuroendocrine secretory proteins. It is found in secretory vesicles of neurons and endocrine cells. This gene product is a precursor to three biologically active peptides; vasostatin, pancreastatin, and parastatin. These peptides act as autocrine or paracrine negative modulators of the neuroendocrine system. Two other peptides, catestatin and chromofungin, have antimicrobial activity and antifungal activity, respectively. Two transcript variants encoding different isoforms have been found for this gene. NA
FK506 binding protein 5 ENSG00000096060 FKBP5 2289 The protein encoded by this gene is a member of the immunophilin protein family, which play a role in immunoregulation and basic cellular processes involving protein folding and trafficking. This encoded protein is a cis-trans prolyl isomerase that binds to the immunosuppressants FK506 and rapamycin. It is thought to mediate calcineurin inhibition. It also interacts functionally with mature hetero-oligomeric progesterone receptor complexes along with the 90 kDa heat shock protein and P23 protein. This gene has been found to have multiple polyadenylation sites. Alternative splicing results in multiple transcript variants. NA
cerebellin 3 precursor ENSG00000139899 CBLN3 643866 Members of the precerebellin family, such as CBLN3, contain a cerebellin motif (see CBLN1; MIM 600432) and a C-terminal C1q signature domain (see MIM 120550) that mediates trimeric assembly of atypical collagen complexes. However, precerebellins do not contain a collagen motif, suggesting that they are not conventional components of the extracellular matrix (Pang et al., 2000 [PubMed 10964938]). NA
adenosylmethionine decarboxylase 1 pseudogene 3 ENSG00000249286 AMD1P3 ENSG00000249286 NA NA
epithelial membrane protein 1 ENSG00000134531 EMP1 2012 NA NA
Epstein-Barr virus induced 3 ENSG00000105246 EBI3 10148 This gene was identified by its induced expression in B lymphocytes in response Epstein-Barr virus infection. It encodes a secreted glycoprotein belonging to the hematopoietin receptor family, and heterodimerizes with a 28 kDa protein to form interleukin 27 (IL-27). IL-27 regulates T cell and inflammatory responses, in part by activating the Jak/STAT pathway of CD4+ T cells. NA
5’-aminolevulinate synthase 1 ENSG00000023330 ALAS1 211 This gene encodes the mitochondrial enzyme which is catalyzes the rate-limiting step in heme (iron-protoporphyrin) biosynthesis. The enzyme encoded by this gene is the housekeeping enzyme; a separate gene encodes a form of the enzyme that is specific for erythroid tissue. The level of the mature encoded protein is regulated by heme: high levels of heme down-regulate the mature enzyme in mitochondria while low heme levels up-regulate. A pseudogene of this gene is located on chromosome 12. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
NA ENSG00000262905 RP5-1029F21.2 ENSG00000262905 NA NA
serine peptidase inhibitor, Kunitz type, 2 ENSG00000167642 SPINT2 10653 This gene encodes a transmembrane protein with two extracellular Kunitz domains that inhibits a variety of serine proteases. The protein inhibits HGF activator which prevents the formation of active hepatocyte growth factor. This gene is a putative tumor suppressor, and mutations in this gene result in congenital sodium diarrhea. Multiple transcript variants encoding different isoforms have been found for this gene. NA
matrix Gla protein ENSG00000111341 MGP 4256 The protein encoded by this gene is secreted and likely acts as an inhibitor of bone formation. The encoded protein is found in the organic matrix of bone and cartilage. Defects in this gene are a cause of Keutel syndrome (KS). Two transcript variants encoding different isoforms have been found for this gene. NA
myomesin 2 ENSG00000036448 MYOM2 9172 The giant protein titin, together with its associated proteins, interconnects the major structure of sarcomeres, the M bands and Z discs. The C-terminal end of the titin string extends into the M line, where it binds tightly to M-band constituents of apparent molecular masses of 190 kD and 165 kD. The predicted MYOM2 protein contains 1,465 amino acids. Like MYOM1, MYOM2 has a unique N-terminal domain followed by 12 repeat domains with strong homology to either fibronectin type III or immunoglobulin C2 domains. Protein sequence comparisons suggested that the MYOM2 protein and bovine M protein are identical. NA
keratin 2 ENSG00000172867 KRT2 3849 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. NA
SEL1L family member 3 ENSG00000091490 SEL1L3 23231 NA NA
cysteine rich secretory protein LCCL domain containing 2 ENSG00000103196 CRISPLD2 83716 NA NA
myosin light chain, phosphorylatable, fast skeletal muscle ENSG00000180209 MYLPF 29895 NA NA
NA ENSG00000232450 RP4-730K3.3 ENSG00000232450 NA NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",11,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 12 Annotations

out <- mygene::queryMany(gene_list[12,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary query name X_id symbol notfound
The protein encoded by this gene is a transcriptional activator involved in cell proliferation. The encoded protein is phosphorylated in M phase and regulates the expression of several cell cycle genes, such as cyclin B1 and cyclin D1. Several transcript variants encoding different isoforms have been found for this gene. ENSG00000111206 forkhead box M1 2305 FOXM1 NA
NA ENSG00000249790 NA ENSG00000249790 RP11-20D14.6 NA
NA ENSG00000237649 kinesin family member C1 3833 KIFC1 NA
This gene encodes a member of the runt domain-containing family of transcription factors. A heterodimer of this protein and a beta subunit forms a complex that binds to the core DNA sequence 5’-PYGPYGGT-3’ found in a number of enhancers and promoters, and can either activate or suppress transcription. It also interacts with other transcription factors. It functions as a tumor suppressor, and the gene is frequently deleted or transcriptionally silenced in cancer. Alternative splicing results in multiple transcript variants. ENSG00000020633 runt related transcription factor 3 864 RUNX3 NA
The protein encoded by this gene is a glutathione-independent prostaglandin D synthase that catalyzes the conversion of prostaglandin H2 (PGH2) to postaglandin D2 (PGD2). PGD2 functions as a neuromodulator as well as a trophic factor in the central nervous system. PGD2 is also involved in smooth muscle contraction/relaxation and is a potent inhibitor of platelet aggregation. This gene is preferentially expressed in brain. Studies with transgenic mice overexpressing this gene suggest that this gene may be also involved in the regulation of non-rapid eye movement sleep. ENSG00000107317 prostaglandin D2 synthase 5730 PTGDS NA
NA ENSG00000228477 NA ENSG00000228477 RP3-342P20.2 NA
This gene encodes an extracellular matrix protein with a spatially and temporally restricted tissue distribution. This protein is homohexameric with disulfide-linked subunits, and contains multiple EGF-like and fibronectin type-III domains. It is implicated in guidance of migrating neurons as well as axons during development, synaptic plasticity, and neuronal regeneration. ENSG00000041982 tenascin C 3371 TNC NA
This gene encodes a member of the galactose-3-O-sulfotransferase protein family. The product of this gene catalyzes sulfonation by transferring a sulfate to the C-3’ position of galactose residues in O-linked glycoproteins. This enzyme is highly specific for core 1 structures, with asialofetuin, Gal-beta-1,3-GalNAc and Gal-beta-1,3 (GlcNAc-beta-1,6)GalNAc being good substrates. ENSG00000197093 galactose-3-O-sulfotransferase 4 79690 GAL3ST4 NA
This gene encodes a member of the carboxypeptidase A family of zinc metalloproteases. This enzyme is produced in the pancreas and preferentially cleaves C-terminal branched-chain and aromatic amino acids from dietary proteins. This gene and several family members are present in a gene cluster on chromosome 7. Mutations in this gene may be linked to chronic pancreatitis, while elevated protein levels may be associated with pancreatic cancer. ENSG00000091704 carboxypeptidase A1 1357 CPA1 NA
U3 RNA, an abundant small nucleolar RNA (snoRNA), is thought to play a role in the processing of ribosomal RNA precursors (Bernstein et al., 1983 [PubMed 6186397]). ENSG00000263934 small nucleolar RNA, C/D box 3A 780851 SNORD3A NA
NA ENSG00000233695 GAS6 antisense RNA 1 ENSG00000233695 GAS6-AS1 NA
The protein encoded by this gene belongs to a family of sarcomeric proteins that bind to calcineurin, a phosphatase involved in calcium-dependent signal transduction in diverse cell types. These family members tether calcineurin to alpha-actinin at the z-line of the sarcomere of cardiac and skeletal muscle cells, and thus they are important for calcineurin signaling. Mutations in this gene cause cardiomyopathy familial hypertrophic type 16, a hereditary heart disorder. ENSG00000172399 myozenin 2 51778 MYOZ2 NA
NA ENSG00000034063 NA NA NA TRUE
NA ENSG00000260686 NA ENSG00000260686 CTB-36H16.2 NA
This gene encodes an extracellular matrix protein, which belongs to the fibulin family. This protein binds various extracellular ligands and calcium. It may play a role during organ development, in particular, during the differentiation of heart, skeletal and neuronal structures. Alternatively spliced transcript variants encoding different isoforms have been identified. ENSG00000163520 fibulin 2 2199 FBLN2 NA
This gene encodes a member of the KDEL endoplasmic reticulum protein retention receptor family. Retention of resident soluble proteins in the lumen of the endoplasmic reticulum (ER) is achieved in both yeast and animal cells by their continual retrieval from the cis-Golgi, or a pre-Golgi compartment. Sorting of these proteins is dependent on a C-terminal tetrapeptide signal, usually lys-asp-glu-leu (KDEL) in animal cells, and his-asp-glu-leu (HDEL) in S. cerevisiae. This process is mediated by a receptor that recognizes, and binds the tetrapeptide-containing protein, and returns it to the ER. In yeast, the sorting receptor encoded by a single gene, ERD2, is a seven-transmembrane protein. Unlike yeast, several human homologs of the ERD2 gene, constituting the KDEL receptor gene family, have been described. KDELR3 was the third member of the family to be identified. Alternate splicing results in multiple transcript variants. ENSG00000100196 KDEL endoplasmic reticulum protein retention receptor 3 11015 KDELR3 NA
This gene is a member of the cytidine deaminase gene family. It is one of seven related genes or pseudogenes found in a cluster thought to result from gene duplication, on chromosome 22. Members of the cluster encode proteins that are structurally and functionally related to the C to U RNA-editing cytidine deaminase APOBEC1. It is thought that the proteins may be RNA editing enzymes and have roles in growth or cell cycle control. ENSG00000244509 apolipoprotein B mRNA editing enzyme catalytic subunit 3C 27350 APOBEC3C NA
NA ENSG00000213846 NA ENSG00000213846 AC098614.2 NA
RMI2 is a component of the BLM (RECQL3; MIM 604610) complex, which plays a role in homologous recombination-dependent DNA repair and is essential for genome stability (Xu et al., 2008 [PubMed 18923082]). ENSG00000175643 RecQ mediated genome instability 2 116028 RMI2 NA
NA ENSG00000068489 proline rich 11 55771 PRR11 NA
Thymidylate synthase catalyzes the methylation of deoxyuridylate to deoxythymidylate using 5,10-methylenetetrahydrofolate (methylene-THF) as a cofactor. This function maintains the dTMP (thymidine-5-prime monophosphate) pool critical for DNA replication and repair. The enzyme has been of interest as a target for cancer chemotherapeutic agents. It is considered to be the primary site of action for 5-fluorouracil, 5-fluoro-2-prime-deoxyuridine, and some folate analogs. Expression of this gene and that of a naturally occuring antisense transcript rTSalpha (GeneID:55556) vary inversely when cell-growth progresses from late-log to plateau phase. ENSG00000176890 thymidylate synthetase 7298 TYMS NA
The protein encoded by this gene is involved in cell motility. It is expressed in breast tissue and together with other proteins, it forms a complex with BRCA1 and BRCA2, thus is potentially associated with higher risk of breast cancer. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. ENSG00000072571 hyaluronan mediated motility receptor 3161 HMMR NA
NA ENSG00000251196 NA ENSG00000251196 RP11-54F2.1 NA
NA ENSG00000175768 translocase of outer mitochondrial membrane 5 401505 TOMM5 NA
This gene encodes a member of the semaphorin family of proteins. The encoded preproprotein is proteolytically processed to generate the mature glycosylphosphatidylinositol (GPI)-anchored membrane glycoprotein. The encoded protein is found on activated lymphocytes and erythrocytes and may be involved in immunomodulatory and neuronal processes. The encoded protein carries the John Milton Hagen (JMH) blood group antigens. Mutations in this gene may be associated with reduced bone mineral density (BMD). Alternative splicing results in multiple transcript variants, at least one of which encodes an isoform that is proteolytically processed. ENSG00000138623 semaphorin 7A (John Milton Hagen blood group) 8482 SEMA7A NA
This gene encodes a member of the adaptor complexes small subunit family. The encoded protein is a subunit of the coatomer protein complex, a seven-subunit complex that functions in the formation of COPI-type, non-clathrin-coated vesicles. COPI vesicles function in the retrograde Golgi-to-ER transport of dilysine-tagged proteins. ENSG00000005243 coatomer protein complex subunit zeta 2 51226 COPZ2 NA
This gene encodes one of the three alpha chains of type VI collagen, a beaded filament collagen found in most connective tissues. The product of this gene contains several domains similar to von Willebrand Factor type A domains. These domains have been shown to bind extracellular matrix proteins, an interaction that explains the importance of this collagen in organizing matrix components. Mutations in this gene are associated with Bethlem myopathy and Ullrich scleroatonic muscular dystrophy. Three transcript variants have been identified for this gene. ENSG00000142173 collagen type VI alpha 2 1292 COL6A2 NA
This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This gene product may regulate the signaling activity of beta-catenin. Two alternately spliced transcripts encoding two protein isoforms have been identified. A processed pseudogene with high similarity to this locus has been mapped to chromosome 12p13. ENSG00000057294 plakophilin 2 5318 PKP2 NA
NA ENSG00000185697 MYB proto-oncogene like 1 4603 MYBL1 NA
NUSAP1 is a nucleolar-spindle-associated protein that plays a role in spindle microtubule organization (Raemaekers et al., 2003 [PubMed 12963707]). ENSG00000137804 nucleolar and spindle associated protein 1 51203 NUSAP1 NA
This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. ENSG00000204983 protease, serine 1 5644 PRSS1 NA
NA ENSG00000111665 cell division cycle associated 3 83461 CDCA3 NA
Cyclin B2 is a member of the cyclin family, specifically the B-type cyclins. The B-type cyclins, B1 and B2, associate with p34cdc2 and are essential components of the cell cycle regulatory machinery. B1 and B2 differ in their subcellular localization. Cyclin B1 co-localizes with microtubules, whereas cyclin B2 is primarily associated with the Golgi region. Cyclin B2 also binds to transforming growth factor beta RII and thus cyclin B2/cdc2 may play a key role in transforming growth factor beta-mediated cell cycle control. ENSG00000157456 cyclin B2 9133 CCNB2 NA
The protein encoded by this gene belongs to the flavoprotein pyridine nucleotide cytochrome reductase family of proteins. Cytochrome b-type NAD(P)H oxidoreductases are implicated in many processes including cholesterol biosynthesis, fatty acid desaturation and elongation, and respiratory burst in neutrophils and macrophages. Cytochrome b5 reductases have soluble and membrane-bound forms that are the product of alternative splicing. In animal cells, the membrane-bound form binds to the endoplasmic reticulum, where it is a member of a fatty acid desaturation complex. Alternative splicing results in multiple transcript variants. ENSG00000166394 cytochrome b5 reductase 2 51700 CYB5R2 NA
Tryptases comprise a family of trypsin-like serine proteases, the peptidase family S1. Tryptases are enzymatically active only as heparin-stabilized tetramers, and they are resistant to all known endogenous proteinase inhibitors. Several tryptase genes are clustered on chromosome 16p13.3. These genes are characterized by several distinct features. They have a highly conserved 3’ UTR and contain tandem repeat sequences at the 5’ flank and 3’ UTR which are thought to play a role in regulation of the mRNA stability. These genes have an intron immediately upstream of the initiator Met codon, which separates the site of transcription initiation from protein coding sequence. This feature is characteristic of tryptases but is unusual in other genes. The alleles of this gene exhibit an unusual amount of sequence variation, such that the alleles were once thought to represent two separate genes, alpha and beta 1. Beta tryptases appear to be the main isoenzymes expressed in mast cells; whereas in basophils, alpha tryptases predominate. Tryptases have been implicated as mediators in the pathogenesis of asthma and other allergic and inflammatory disorders. ENSG00000172236 tryptase alpha/beta 1 7177 TPSAB1 NA
NA ENSG00000150636 coiled-coil domain containing 102B 79839 CCDC102B NA
This gene is a member of the matrix metalloproteinase (MMP) gene family, that are zinc-dependent enzymes capable of cleaving components of the extracellular matrix and molecules involved in signal transduction. The protein encoded by this gene is a gelatinase A, type IV collagenase, that contains three fibronectin type II repeats in its catalytic site that allow binding of denatured type IV and V collagen and elastin. Unlike most MMP family members, activation of this protein can occur on the cell membrane. This enzyme can be activated extracellularly by proteases, or, intracellulary by its S-glutathiolation with no requirement for proteolytical removal of the pro-domain. This protein is thought to be involved in multiple pathways including roles in the nervous system, endometrial menstrual breakdown, regulation of vascularization, and metastasis. Mutations in this gene have been associated with Winchester syndrome and Nodulosis-Arthropathy-Osteolysis (NAO) syndrome. Alternative splicing results in multiple transcript variants encoding different isoforms. ENSG00000087245 matrix metallopeptidase 2 4313 MMP2 NA
NA ENSG00000168876 ankyrin repeat domain 49 54851 ANKRD49 NA
NA ENSG00000224729 PCOLCE antisense RNA 1 100129845 PCOLCE-AS1 NA
NA ENSG00000272016 NA NA NA TRUE
This gene encodes a member of the mannose receptor family of proteins that contain a fibronectin type II domain and multiple C-type lectin-like domains. The encoded protein plays a role in extracellular matrix remodeling by mediating the internalization and lysosomal degradation of collagen ligands. Expression of this gene may play a role in the tumorigenesis and metastasis of several malignancies including breast cancer, gliomas and metastatic bone disease. ENSG00000011028 mannose receptor C type 2 9902 MRC2 NA
This gene is a member of the RUNX family of transcription factors and encodes a nuclear protein with an Runt DNA-binding domain. This protein is essential for osteoblastic differentiation and skeletal morphogenesis and acts as a scaffold for nucleic acids and regulatory factors involved in skeletal gene expression. The protein can bind DNA both as a monomer or, with more affinity, as a subunit of a heterodimeric complex. Mutations in this gene have been associated with the bone development disorder cleidocranial dysplasia (CCD). Transcript variants that encode different protein isoforms result from the use of alternate promoters as well as alternate splicing. ENSG00000124813 runt related transcription factor 2 860 RUNX2 NA
The protein encoded by this gene is a secreted, extracellular matrix protein containing an Arg-Gly-Asp (RGD) motif and calcium-binding EGF-like domains. It promotes adhesion of endothelial cells through interaction of integrins and the RGD motif. It is prominently expressed in developing arteries but less so in adult vessels. However, its expression is reinduced in balloon-injured vessels and atherosclerotic lesions, notably in intimal vascular smooth muscle cells and endothelial cells. Therefore, the protein encoded by this gene may play a role in vascular development and remodeling. Defects in this gene are a cause of autosomal dominant cutis laxa, autosomal recessive cutis laxa type I (CL type I), and age-related macular degeneration type 3 (ARMD3). ENSG00000140092 fibulin 5 10516 FBLN5 NA
Protein disulfide isomerases (EC 5.3.4.1), such as PDIP, are endoplasmic reticulum (ER) resident proteins that catalyze protein folding and thiol-disulfide interchange reactions (Desilva et al., 1996 [PubMed 8561901]). ENSG00000185615 protein disulfide isomerase family A member 2 64714 PDIA2 NA
This gene is a member of a group of genes whose transcript levels are increased following stressful growth arrest conditions and treatment with DNA-damaging agents. The induction of this gene by ionizing radiation occurs in certain cell lines regardless of p53 status, and its protein response is correlated with apoptosis following ionizing radiation. ENSG00000087074 protein phosphatase 1 regulatory subunit 15A 23645 PPP1R15A NA
NA ENSG00000162878 protein kinase domain containing, cytoplasmic 91461 PKDCC NA
NA ENSG00000249835 VCAN antisense RNA 1 ENSG00000249835 VCAN-AS1 NA
NA ENSG00000095203 erythrocyte membrane protein band 4.1 like 4B 54566 EPB41L4B NA
NA ENSG00000088325 TPX2, microtubule nucleation factor 22974 TPX2 NA
The leucine-rich repeat (LRR) family of proteins, including LRG1, have been shown to be involved in protein-protein interaction, signal transduction, and cell adhesion and development. LRG1 is expressed during granulocyte differentiation (O’Donnell et al., 2002 [PubMed 12223515]). ENSG00000171236 leucine rich alpha-2-glycoprotein 1 116844 LRG1 NA
This gene encodes a gamma-carboxyglutamic acid (Gla)-containing protein thought to be involved in the stimulation of cell proliferation. This gene is frequently overexpressed in many cancers and has been implicated as an adverse prognostic marker. Elevated protein levels are additionally associated with a variety of disease states, including venous thromboembolic disease, systemic lupus erythematosus, chronic renal failure, and preeclampsia. ENSG00000183087 growth arrest specific 6 2621 GAS6 NA
Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a muscle-specific, alpha actinin isoform that is expressed in both skeletal and cardiac muscles. Several transcript variants encoding different isoforms have been found for this gene. ENSG00000077522 actinin alpha 2 88 ACTN2 NA
NA ENSG00000260296 NA ENSG00000260296 RP11-395I6.3 NA
This gene represents a member of the formin family of proteins. It is considered a diaphanous formin due to the presence of a diaphanous inhibitory domain located at the N-terminus of the encoded protein. Studies of a similar mouse protein indicate that the protein encoded by this locus may function in polymerization and depolymerization of actin filaments. Mutations at this locus have been associated with focal segmental glomerulosclerosis 5. ENSG00000203485 inverted formin, FH2 and WH2 domain containing 64423 INF2 NA
NA ENSG00000121690 DEP domain containing 7 91614 DEPDC7 NA
The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. ENSG00000106537 tetraspanin 13 27075 TSPAN13 NA
This gene encodes a member of the fibrillin family of proteins. The encoded preproprotein is proteolytically processed to generate two proteins including the extracellular matrix component fibrillin-1 and the protein hormone asprosin. Fibrillin-1 is an extracellular matrix glycoprotein that serves as a structural component of calcium-binding microfibrils. These microfibrils provide force-bearing structural support in elastic and nonelastic connective tissue throughout the body. Asprosin, secreted by white adipose tissue, has been shown to regulate glucose homeostasis. Mutations in this gene are associated with Marfan syndrome and the related MASS phenotype, as well as ectopia lentis syndrome, Weill-Marchesani syndrome, Shprintzen-Goldberg syndrome and neonatal progeroid syndrome. ENSG00000166147 fibrillin 1 2200 FBN1 NA
Sterile alpha motifs (SAMs) in proteins such as SAMD4A are part of an RNA-binding domain that functions as a posttranscriptional regulator by binding to an RNA sequence motif known as the Smaug recognition element, which was named after the Drosophila Smaug protein (Baez and Boccaccio, 2005 [PubMed 16221671]). ENSG00000020577 sterile alpha motif domain containing 4A 23034 SAMD4A NA
NA ENSG00000204219 transcription elongation factor A3 6920 TCEA3 NA
Prostaglandin-endoperoxide synthase (PTGS), also known as cyclooxygenase, is the key enzyme in prostaglandin biosynthesis, and acts both as a dioxygenase and as a peroxidase. There are two isozymes of PTGS: a constitutive PTGS1 and an inducible PTGS2, which differ in their regulation of expression and tissue distribution. This gene encodes the inducible isozyme. It is regulated by specific stimulatory events, suggesting that it is responsible for the prostanoid biosynthesis involved in inflammation and mitogenesis. ENSG00000073756 prostaglandin-endoperoxide synthase 2 5743 PTGS2 NA
NA ENSG00000255443 NA ENSG00000255443 RP1-68D18.4 NA
The protein encoded by this gene is a transcription factor with three tandem C2H2-type zinc fingers. Defects in this gene are associated with Charcot-Marie-Tooth disease type 1D (CMT1D), Charcot-Marie-Tooth disease type 4E (CMT4E), and with Dejerine-Sottas syndrome (DSS). Multiple transcript variants encoding two different isoforms have been found for this gene. ENSG00000122877 early growth response 2 1959 EGR2 NA
NA ENSG00000168928 chymotrypsinogen B2 440387 CTRB2 NA
This gene encodes the vitamin K-dependent coagulation factor X of the blood coagulation cascade. This factor undergoes multiple processing steps before its preproprotein is converted to a mature two-chain form by the excision of the tripeptide RKR. Two chains of the factor are held together by 1 or more disulfide bonds; the light chain contains 2 EGF-like domains, while the heavy chain contains the catalytic domain which is structurally homologous to those of the other hemostatic serine proteases. The mature factor is activated by the cleavage of the activation peptide by factor IXa (in the intrisic pathway), or by factor VIIa (in the extrinsic pathway). The activated factor then converts prothrombin to thrombin in the presence of factor Va, Ca+2, and phospholipid during blood clotting. Mutations of this gene result in factor X deficiency, a hemorrhagic condition of variable severity. Alternative splicing results in multiple transcript variants encoding different isoforms that may undergo similar proteolytic processing to generate mature polypeptides. ENSG00000126218 coagulation factor X 2159 F10 NA
NA ENSG00000261542 NA ENSG00000261542 RP11-16E18.3 NA
NA ENSG00000188707 ZBED6 C-terminal like 113763 ZBED6CL NA
The protein encoded by this gene is a cell cycle-regulated kinase that appears to be involved in microtubule formation and/or stabilization at the spindle pole during chromosome segregation. The encoded protein is found at the centrosome in interphase cells and at the spindle poles in mitosis. This gene may play a role in tumor development and progression. A processed pseudogene of this gene has been found on chromosome 1, and an unprocessed pseudogene has been found on chromosome 10. Multiple transcript variants encoding the same protein have been found for this gene. ENSG00000087586 aurora kinase A 6790 AURKA NA
The Shaker gene family of Drosophila encodes components of voltage-gated potassium channels and is comprised of four subfamilies. Based on sequence similarity, this gene is similar to one of these subfamilies, namely the Shaw subfamily. The protein encoded by this gene belongs to the delayed rectifier class of channel proteins and is an integral membrane protein that mediates the voltage-dependent potassium ion permeability of excitable membranes. Alternate splicing results in several transcript variants. ENSG00000131398 potassium voltage-gated channel subfamily C member 3 3748 KCNC3 NA
APOLD1 is an endothelial cell early response protein that may play a role in regulation of endothelial cell signaling and vascular function (Regard et al., 2004 [PubMed 15102925]). ENSG00000178878 apolipoprotein L domain containing 1 81575 APOLD1 NA
The protein encoded by this gene is a secretory protein that contains a hyaluronan-binding domain, and thus is a member of the hyaluronan-binding protein family. The hyaluronan-binding domain is known to be involved in extracellular matrix stability and cell migration. This protein has been shown to form a stable complex with inter-alpha-inhibitor (I alpha I), and thus enhance the serine protease inhibitory activity of I alpha I, which is important in the protease network associated with inflammation. This gene can be induced by proinflammatory cytokines such as tumor necrosis factor alpha and interleukin-1. Enhanced levels of this protein are found in the synovial fluid of patients with osteoarthritis and rheumatoid arthritis. ENSG00000123610 TNF alpha induced protein 6 7130 TNFAIP6 NA
Myosin is a hexameric ATPase cellular motor protein. It is composed of two myosin heavy chains, two nonphosphorylatable myosin alkali light chains, and two phosphorylatable myosin regulatory light chains. This gene encodes a myosin alkali light chain that is found in embryonic muscle and adult atria. Two alternatively spliced transcript variants encoding the same protein have been found for this gene. ENSG00000198336 myosin light chain 4 4635 MYL4 NA
This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. It has both 17alpha-hydroxylase and 17,20-lyase activities and is a key enzyme in the steroidogenic pathway that produces progestins, mineralocorticoids, glucocorticoids, androgens, and estrogens. Mutations in this gene are associated with isolated steroid-17 alpha-hydroxylase deficiency, 17-alpha-hydroxylase/17,20-lyase deficiency, pseudohermaphroditism, and adrenal hyperplasia. ENSG00000148795 cytochrome P450 family 17 subfamily A member 1 1586 CYP17A1 NA
NA ENSG00000175087 PDLIM1 interacting kinase 1 like 149420 PDIK1L NA
The protein encoded by this gene belongs to the calcium channel beta subunit family. It plays an important role in the calcium channel by modulating G protein inhibition, increasing peak calcium current, controlling the alpha-1 subunit membrane targeting and shifting the voltage dependence of activation and inactivation. Alternative splicing occurs at this locus and three transcript variants encoding three distinct isoforms have been identified. ENSG00000067191 calcium voltage-gated channel auxiliary subunit beta 1 782 CACNB1 NA
NA ENSG00000246082 nudix hydrolase 16 pseudogene 1 152195 NUDT16P1 NA
NA ENSG00000168274 NA NA NA TRUE
This gene encodes a member of the fascin family of actin-binding proteins. Fascin proteins organize F-actin into parallel bundles, and are required for the formation of actin-based cellular protrusions. The encoded protein plays a critical role in cell migration, motility, adhesion and cellular interactions. Expression of this gene is known to be regulated by several microRNAs, and overexpression of this gene may play a role in the metastasis of multiple types of cancer by increasing cell motility. Expression of this gene is also a marker for Reed-Sternberg cells in Hodgkin’s lymphoma. A pseudogene of this gene is located on the long arm of chromosome 15. ENSG00000075618 fascin actin-bundling protein 1 6624 FSCN1 NA
NA ENSG00000135362 proline rich 5 like 79899 PRR5L NA
Chondroadherin is a cartilage matrix protein thought to mediate adhesion of isolated chondrocytes. The protein contains 11 leucine-rich repeats flanked by cysteine-rich regions. The chondroadherin messenger RNA is present in chondrocytes at all ages. ENSG00000136457 chondroadherin 1101 CHAD NA
NA ENSG00000168389 major facilitator superfamily domain containing 2A 84879 MFSD2A NA
NA ENSG00000247134 NA ENSG00000247134 RP11-11N9.4 NA
The inhibin beta A subunit joins the alpha subunit to form a pituitary FSH secretion inhibitor. Inhibin has been shown to regulate gonadal stromal cell proliferation negatively and to have tumor-suppressor activity. In addition, serum levels of inhibin have been shown to reflect the size of granulosa-cell tumors and can therefore be used as a marker for primary as well as recurrent disease. Because expression in gonadal and various extragonadal tissues may vary severalfold in a tissue-specific fashion, it is proposed that inhibin may be both a growth/differentiation factor and a hormone. Furthermore, the beta A subunit forms a homodimer, activin A, and also joins with a beta B subunit to form a heterodimer, activin AB, both of which stimulate FSH secretion. Finally, it has been shown that the beta A subunit mRNA is identical to the erythroid differentiation factor subunit mRNA and that only one gene for this mRNA exists in the human genome. ENSG00000122641 inhibin beta A subunit 3624 INHBA NA
NA ENSG00000124701 apolipoprotein B mRNA editing enzyme catalytic subunit 2 10930 APOBEC2 NA
NA ENSG00000155363 Mov10 RISC complex RNA helicase 4343 MOV10 NA
NA ENSG00000182902 solute carrier family 25 member 18 83733 SLC25A18 NA
This gene encodes a member of the steroid-thyroid hormone-retinoid receptor superfamily. Expression is induced by phytohemagglutinin in human lymphocytes and by serum stimulation of arrested fibroblasts. The encoded protein acts as a nuclear transcription factor. Translocation of the protein from the nucleus to mitochondria induces apoptosis. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000123358 nuclear receptor subfamily 4 group A member 1 3164 NR4A1 NA
This gene encodes an adenylate kinase enzyme involved in energy metabolism and homeostasis of cellular adenine nucleotide ratios in different intracellular compartments. This gene is highly expressed in skeletal muscle, brain and erythrocytes. Certain mutations in this gene resulting in a functionally inadequate enzyme are associated with a rare genetic disorder causing nonspherocytic hemolytic anemia. Alternative splicing of this gene results in multiple transcript variants encoding different isoforms. ENSG00000106992 adenylate kinase 1 203 AK1 NA
NA ENSG00000258782 NA ENSG00000258782 RP11-701B16.2 NA
NA ENSG00000235092 ID2 antisense RNA 1 (head to head) 100506299 ID2-AS1 NA
NA ENSG00000142765 synaptotagmin like 1 84958 SYTL1 NA
Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene consists of two polypeptide chains activated from a single precursor protein by proteolysis. The encoded protein is involved the later stages of cell envelope formation in the epidermis and hair follicle. ENSG00000125780 transglutaminase 3 7053 TGM3 NA
NA ENSG00000225075 NA ENSG00000225075 RP11-426L16.3 NA
This gene encodes a member of the dedicator of cytokinesis protein family. Members of this family are guanosine nucleotide exchange factors for Rho GTPases and defined by the presence of conserved DOCK-homology regions. The encoded protein belongs to the D (or Zizimin) subfamily of DOCK proteins, which also contain an N-terminal pleckstrin homology domain. Alternatively spliced transcript variants that encode different isoforms have been described. ENSG00000135905 dedicator of cytokinesis 10 55619 DOCK10 NA
This gene encodes a member of the serpin family of serine protease inhibitors. The protein is a major inhibitor of plasmin, which degrades fibrin and various other proteins. Consequently, the proper function of this gene has a major role in regulating the blood clotting pathway. Mutations in this gene result in alpha-2-plasmin inhibitor deficiency, which is characterized by severe hemorrhagic diathesis. Multiple transcript variants encoding different isoforms have been found for this gene. ENSG00000167711 serpin family F member 2 5345 SERPINF2 NA
The protein encoded by this gene is a glutathione-dependent prostaglandin E synthase. The expression of this gene has been shown to be induced by proinflammatory cytokine interleukin 1 beta (IL1B). Its expression can also be induced by tumor suppressor protein TP53, and may be involved in TP53 induced apoptosis. Knockout studies in mice suggest that this gene may contribute to the pathogenesis of collagen-induced arthritis and mediate acute pain during inflammatory responses. ENSG00000148344 prostaglandin E synthase 9536 PTGES NA
Troponin proteins associate with tropomyosin and regulate the calcium sensitivity of the myofibril contractile apparatus of striated muscles. Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. The TnI-fast and TnI-slow genes are expressed in fast-twitch and slow-twitch skeletal muscle fibers, respectively, while the TnI-cardiac gene is expressed exclusively in cardiac muscle tissue. This gene encodes the Troponin-I-skeletal-slow-twitch protein. This gene is expressed in cardiac and skeletal muscle during early development but is restricted to slow-twitch skeletal muscle fibers in adults. The encoded protein prevents muscle contraction by inhibiting calcium-mediated conformational changes in actin-myosin complexes. ENSG00000159173 troponin I1, slow skeletal type 7135 TNNI1 NA
NA ENSG00000237773 NA ENSG00000237773 AC003075.4 NA
The protein encoded by this gene is a mitochondrial phosphate-activated glutaminase that catalyzes the hydrolysis of glutamine to stoichiometric amounts of glutamate and ammonia. Originally thought to be liver-specific, this protein has been found in other tissues as well. Alternative splicing results in multiple transcript variants that encode different isoforms. ENSG00000135423 glutaminase 2 27165 GLS2 NA
NA ENSG00000259088 NA ENSG00000259088 CTD-2017C7.2 NA
NA ENSG00000213149 calponin 2 pseudogene 9 ENSG00000213149 CNN2P9 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",12,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 13 Annotations

out <- mygene::queryMany(gene_list[13,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
kable(as.data.frame(out))
symbol X_id name query summary
LOR 4014 loricrin ENSG00000203782 This gene encodes loricrin, a major protein component of the cornified cell envelope found in terminally differentiated epidermal cells. Mutations in this gene are associated with Vohwinkel’s syndrome and progressive symmetric erythrokeratoderma, both inherited skin diseases.
KRT2 3849 keratin 2 ENSG00000172867 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is expressed largely in the upper spinous layer of epidermal keratinocytes and mutations in this gene have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13.
CDHR1 92211 cadherin related family member 1 ENSG00000148600 This gene belongs to the cadherin superfamily of calcium-dependent cell adhesion molecules. The encoded protein is a photoreceptor-specific cadherin that plays a role in outer segment disc morphogenesis. Mutations in this gene are associated with inherited retinal dystrophies. Alternatively spliced transcript variants encoding different isoforms have been identified.
TNNT2 7139 troponin T2, cardiac type ENSG00000118194 The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined.
DMKN 93099 dermokine ENSG00000161249 This gene is upregulated in inflammatory diseases, and it was first observed as expressed in the differentiated layers of skin. The most interesting aspect of this gene is the differential use of promoters and terminators to generate isoforms with unique cellular distributions and domain components. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene.
S100A14 57402 S100 calcium binding protein A14 ENSG00000189334 This gene encodes a member of the S100 protein family which contains an EF-hand motif and binds calcium. The gene is located in a cluster of S100 genes on chromosome 1. Levels of the encoded protein have been found to be lower in cancerous tissue and associated with metastasis suggesting a tumor suppressor function (PMID: 19956863, 19351828).
PKP3 11187 plakophilin 3 ENSG00000184363 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This protein may act in cellular desmosome-dependent adhesion and signaling pathways. Two transcript variants encoding different isoforms have been found for this gene.
MUCL1 118430 mucin like 1 ENSG00000172551 NA
SAA1 6288 serum amyloid A1 ENSG00000173432 This gene encodes a member of the serum amyloid A family of apolipoproteins. The encoded preproprotein is proteolytically processed to generate the mature protein. This protein is a major acute phase protein that is highly expressed in response to inflammation and tissue injury. This protein also plays an important role in HDL metabolism and cholesterol homeostasis. High levels of this protein are associated with chronic inflammatory diseases including atherosclerosis, rheumatoid arthritis, Alzheimer’s disease and Crohn’s disease. This protein may also be a potential biomarker for certain tumors. Alternate splicing results in multiple transcript variants that encode the same protein. A pseudogene of this gene is found on chromosome 11.
KRT1 3848 keratin 1 ENSG00000167768 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13.
DCD 117159 dermcidin ENSG00000161634 This antimicrobial gene encodes a secreted protein that is subsequently processed into mature peptides of distinct biological activities. The C-terminal peptide is constitutively expressed in sweat and has antibacterial and antifungal activities. The N-terminal peptide, also known as diffusible survival evasion peptide, promotes neural cell survival under conditions of severe oxidative stress. A glycosylated form of the N-terminal peptide may be associated with cachexia (muscle wasting) in cancer patients. Alternative splicing results in multiple transcript variants encoding different isoforms.
NEBL 10529 nebulette ENSG00000078114 This gene encodes a nebulin like protein that is abundantly expressed in cardiac muscle. The encoded protein binds actin and interacts with thin filaments and Z-line associated proteins in striated muscle. This protein may be involved in cardiac myofibril assembly. A shorter isoform of this protein termed LIM nebulette is expressed in non-muscle cells and may function as a component of focal adhesion complexes. Alternate splicing results in multiple transcript variants.
MYCL 4610 v-myc avian myelocytomatosis viral oncogene lung carcinoma derived homolog ENSG00000116990 NA
PCK1 5105 phosphoenolpyruvate carboxykinase 1 ENSG00000124253 This gene is a main control point for the regulation of gluconeogenesis. The cytosolic enzyme encoded by this gene, along with GTP, catalyzes the formation of phosphoenolpyruvate from oxaloacetate, with the release of carbon dioxide and GDP. The expression of this gene can be regulated by insulin, glucocorticoids, glucagon, cAMP, and diet. Defects in this gene are a cause of cytosolic phosphoenolpyruvate carboxykinase deficiency. A mitochondrial isozyme of the encoded protein also has been characterized.
CALML5 51806 calmodulin like 5 ENSG00000178372 This gene encodes a novel calcium binding protein expressed in the epidermis and related to the calmodulin family of calcium binding proteins. Functional studies with recombinant protein demonstrate it does bind calcium and undergoes a conformational change when it does so. Abundant expression is detected only in reconstructed epidermis and is restricted to differentiating keratinocytes. In addition, it can associate with transglutaminase 3, shown to be a key enzyme in the terminal differentiation of keratinocytes.
C3orf52 79669 chromosome 3 open reading frame 52 ENSG00000114529 NA
THEM5 284486 thioesterase superfamily member 5 ENSG00000196407 NA
KRTDAP 388533 keratinocyte differentiation associated protein ENSG00000188508 This gene encodes a protein which may function in the regulation of keratinocyte differentiation and maintenance of stratified epithelia. Multiple transcript variants encoding different isoforms have been found for this gene.
SAA2-SAA4 100528017 SAA2-SAA4 readthrough ENSG00000255071 This locus represents naturally occurring read-through transcription between the neighboring serum amyloid A2 and serum amyloid A4 genes on chromosome 11. The read-through transcript produces a fusion protein that shares sequence identity with each individual gene product.
SAA2 6289 serum amyloid A2 ENSG00000134339 NA
HHATL 57467 hedgehog acyltransferase-like ENSG00000010282 NA
RHOV 171177 ras homolog family member V ENSG00000104140 NA
CDH1 999 cadherin 1 ENSG00000039068 This gene encodes a classical cadherin of the cadherin superfamily. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein that is proteolytically processed to generate the mature glycoprotein. This calcium-dependent cell-cell adhesion protein is comprised of five extracellular cadherin repeats, a transmembrane region and a highly conserved cytoplasmic tail. Mutations in this gene are correlated with gastric, breast, colorectal, thyroid and ovarian cancer. Loss of function of this gene is thought to contribute to cancer progression by increasing proliferation, invasion, and/or metastasis. The ectodomain of this protein mediates bacterial adhesion to mammalian cells and the cytoplasmic domain is required for internalization. This gene is present in a gene cluster with other members of the cadherin family on chromosome 16.
DSP 1832 desmoplakin ENSG00000096696 This gene encodes a protein that anchors intermediate filaments to desmosomal plaques and forms an obligate component of functional desmosomes. Mutations in this gene are the cause of several cardiomyopathies and keratodermas, including skin fragility-woolly hair syndrome. Alternative splicing results in multiple transcript variants.
RAP1GAP 5909 RAP1 GTPase activating protein ENSG00000076864 This gene encodes a type of GTPase-activating-protein (GAP) that down-regulates the activity of the ras-related RAP1 protein. RAP1 acts as a molecular switch by cycling between an inactive GDP-bound form and an active GTP-bound form. The product of this gene, RAP1GAP, promotes the hydrolysis of bound GTP and hence returns RAP1 to the inactive state whereas other proteins, guanine nucleotide exchange factors (GEFs), act as RAP1 activators by facilitating the conversion of RAP1 from the GDP- to the GTP-bound form. In general, ras subfamily proteins, such as RAP1, play key roles in receptor-linked signaling pathways that control cell growth and differentiation. RAP1 plays a role in diverse processes such as cell proliferation, adhesion, differentiation, and embryogenesis. Alternative splicing results in multiple transcript variants encoding distinct proteins.
TNNI3 7137 troponin I3, cardiac type ENSG00000129991 Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. This gene encodes the TnI-cardiac protein and is exclusively expressed in cardiac muscle tissues. Mutations in this gene cause familial hypertrophic cardiomyopathy type 7 (CMH7) and familial restrictive cardiomyopathy (RCM).
PIP 5304 prolactin induced protein ENSG00000159763 NA
EPS8L1 54869 EPS8 like 1 ENSG00000131037 This gene encodes a protein that is related to epidermal growth factor receptor pathway substrate 8 (EPS8), a substrate for the epidermal growth factor receptor. The function of this protein is unknown. At least two alternatively spliced transcript variants encoding different isoforms have been found for this gene.
ADH1B 125 alcohol dehydrogenase 1B (class I), beta polypeptide ENSG00000196616 The protein encoded by this gene is a member of the alcohol dehydrogenase family. Members of this enzyme family metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. This encoded protein, consisting of several homo- and heterodimers of alpha, beta, and gamma subunits, exhibits high activity for ethanol oxidation and plays a major role in ethanol catabolism. Three genes encoding alpha, beta and gamma subunits are tandemly organized in a genomic segment as a gene cluster. Two transcript variants encoding different isoforms have been found for this gene.
FNDC4 64838 fibronectin type III domain containing 4 ENSG00000115226 NA
CTSV 1515 cathepsin V ENSG00000136943 The protein encoded by this gene, a member of the peptidase C1 family, is a lysosomal cysteine proteinase that may play an important role in corneal physiology. This gene is expressed in colorectal and breast carcinomas but not in normal colon, mammary gland, or peritumoral tissues, suggesting a possible role for this gene in tumor processes. Alternatively spliced variants, encoding the same protein, have been identified.
SGPP2 130367 sphingosine-1-phosphate phosphatase 2 ENSG00000163082 The protein encoded by this gene is a transmembrane protein that degrades the bioactive signaling molecule sphingosine 1-phosphate. The encoded protein is induced during inflammatory responses and has been shown to be downregulated by the microRNA-31 tumor suppressor. Alternative splice variants encoding different isoforms have been found for this gene.
ANKRD1 27063 ankyrin repeat domain 1 ENSG00000148677 The protein encoded by this gene is localized to the nucleus of endothelial cells and is induced by IL-1 and TNF-alpha stimulation. Studies in rat cardiomyocytes suggest that this gene functions as a transcription factor. Interactions between this protein and the sarcomeric proteins myopalladin and titin suggest that it may also be involved in the myofibrillar stretch-sensor system.
ENO1P1 ENSG00000244457 enolase 1, (alpha) pseudogene 1 ENSG00000244457 NA
KRT10 3858 keratin 10 ENSG00000186395 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21.
RAB25 57111 RAB25, member RAS oncogene family ENSG00000132698 The protein encoded by this gene is a member of the RAS superfamily of small GTPases. The encoded protein is involved in membrane trafficking and cell survival. This gene has been found to be a tumor suppressor and an oncogene, depending on the context. Two variants, one protein-coding and the other not, have been found for this gene.
RP11-517C16.2 ENSG00000261286 NA ENSG00000261286 NA
TNNC1 7134 troponin C1, slow skeletal and cardiac type ENSG00000114854 Troponin is a central regulatory protein of striated muscle contraction, and together with tropomyosin, is located on the actin filament. Troponin consists of 3 subunits: TnI, which is the inhibitor of actomyosin ATPase; TnT, which contains the binding site for tropomyosin; and TnC, the protein encoded by this gene. The binding of calcium to TnC abolishes the inhibitory action of TnI, thus allowing the interaction of actin with myosin, the hydrolysis of ATP, and the generation of tension. Mutations in this gene are associated with cardiomyopathy dilated type 1Z.
KRT14 3861 keratin 14 ENSG00000186847 This gene encodes a member of the keratin family, the most diverse group of intermediate filaments. This gene product, a type I keratin, is usually found as a heterotetramer with two keratin 5 molecules, a type II keratin. Together they form the cytoskeleton of epithelial cells. Mutations in the genes for these keratins are associated with epidermolysis bullosa simplex. At least one pseudogene has been identified at 17p12-p11.
PEBP4 157310 phosphatidylethanolamine binding protein 4 ENSG00000134020 The phosphatidylethanolamine (PE)-binding proteins, including PEBP4, are an evolutionarily conserved family of proteins with pivotal biologic functions, such as lipid binding and inhibition of serine proteases (Wang et al., 2004 [PubMed 15302887]).
AOX1 316 aldehyde oxidase 1 ENSG00000138356 Aldehyde oxidase produces hydrogen peroxide and, under certain conditions, can catalyze the formation of superoxide. Aldehyde oxidase is a candidate gene for amyotrophic lateral sclerosis.
B4GALNT3 283358 beta-1,4-N-acetyl-galactosaminyltransferase 3 ENSG00000139044 B4GALNT3 transfers N-acetylgalactosamine (GalNAc) onto glucosyl residues to form N,N-prime-diacetyllactosediamine (LacdiNAc, or LDN), a unique terminal structure of cell surface N-glycans (Ikehara et al., 2006 [PubMed 16728562]).
TNNT1 7138 troponin T1, slow skeletal type ENSG00000105048 This gene encodes a protein that is a subunit of troponin, which is a regulatory complex located on the thin filament of the sarcomere. This complex regulates striated muscle contraction in response to fluctuations in intracellular calcium concentration. This complex is composed of three subunits: troponin C, which binds calcium, troponin T, which binds tropomyosin, and troponin I, which is an inhibitory subunit. This protein is the slow skeletal troponin T subunit. Mutations in this gene cause nemaline myopathy type 5, also known as Amish nemaline myopathy, a neuromuscular disorder characterized by muscle weakness and rod-shaped, or nemaline, inclusions in skeletal muscle fibers which affects infants, resulting in death due to respiratory insufficiency, usually in the second year. Multiple transcript variants encoding different isoforms have been found for this gene.
SBSN 374897 suprabasin ENSG00000189001 NA
RP11-229P13.23 ENSG00000231864 NA ENSG00000231864 NA
PRSS8 5652 protease, serine 8 ENSG00000052344 This gene encodes a member of the peptidase S1 or chymotrypsin family of serine proteases. The encoded preproprotein is proteolytically processed to generate light and heavy chains that associate via a disulfide bond to form the heterodimeric enzyme. This enzyme is highly expressed in prostate epithelia and is one of several proteolytic enzymes found in seminal fluid. This protease exhibits trypsin-like substrate specificity, cleaving protein substrates at the carboxyl terminus of lysine or arginine residues. The encoded protease partially mediates proteolytic activation of the epithelial sodium channel, a regulator of sodium balance, and may also play a role in epithelial barrier formation.
NRAP 4892 nebulin related anchoring protein ENSG00000197893 NA
CAMK2B 816 calcium/calmodulin dependent protein kinase II beta ENSG00000058404 The product of this gene belongs to the serine/threonine protein kinase family and to the Ca(2+)/calmodulin-dependent protein kinase subfamily. Calcium signaling is crucial for several aspects of plasticity at glutamatergic synapses. In mammalian cells, the enzyme is composed of four different chains: alpha, beta, gamma, and delta. The product of this gene is a beta chain. It is possible that distinct isoforms of this chain have different cellular localizations and interact differently with calmodulin. Alternative splicing results in multiple transcript variants.
CTD-2201G16.1 ENSG00000258444 NA ENSG00000258444 NA
SOX9 6662 SRY-box 9 ENSG00000125398 The protein encoded by this gene recognizes the sequence CCTTGAG along with other members of the HMG-box class DNA-binding proteins. It acts during chondrocyte differentiation and, with steroidogenic factor 1, regulates transcription of the anti-Muellerian hormone (AMH) gene. Deficiencies lead to the skeletal malformation syndrome campomelic dysplasia, frequently with sex reversal.
FAM83H 286077 family with sequence similarity 83 member H ENSG00000180921 The protein encoded by this gene plays an important role in the structural development and calcification of tooth enamel. Defects in this gene are a cause of amelogenesis imperfecta type 3 (AI3).
SULT2B1 6820 sulfotransferase family 2B member 1 ENSG00000088002 Sulfotransferase enzymes catalyze the sulfate conjugation of many hormones, neurotransmitters, drugs, and xenobiotic compounds. These cytosolic enzymes are different in their tissue distributions and substrate specificities. The gene structure (number and length of exons) is similar among family members. This gene sulfates dehydroepiandrosterone but not 4-nitrophenol, a typical substrate for the phenol and estrogen sulfotransferase subfamilies. Two alternatively spliced variants that encode different isoforms have been described.
PFKFB2 5208 6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 2 ENSG00000123836 The protein encoded by this gene is involved in both the synthesis and degradation of fructose-2,6-bisphosphate, a regulatory molecule that controls glycolysis in eukaryotes. The encoded protein has a 6-phosphofructo-2-kinase activity that catalyzes the synthesis of fructose-2,6-bisphosphate, and a fructose-2,6-biphosphatase activity that catalyzes the degradation of fructose-2,6-bisphosphate. This protein regulates fructose-2,6-bisphosphate levels in the heart, while a related enzyme encoded by a different gene regulates fructose-2,6-bisphosphate levels in the liver and muscle. This enzyme functions as a homodimer. Two transcript variants encoding two different isoforms have been found for this gene.
SAPCD2 89958 suppressor APC domain containing 2 ENSG00000186193 NA
LMO7 4008 LIM domain 7 ENSG00000136153 This gene encodes a protein containing a calponin homology (CH) domain, a PDZ domain, and a LIM domain, and may be involved in protein-protein interactions. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, however, the full-length nature of some variants is not known.
MYH7 4625 myosin, heavy chain 7, cardiac muscle, beta ENSG00000092054 Muscle myosin is a hexameric protein containing 2 heavy chain subunits, 2 alkali light chain subunits, and 2 regulatory light chain subunits. This gene encodes the beta (or slow) heavy chain subunit of cardiac myosin. It is expressed predominantly in normal human ventricle. It is also expressed in skeletal muscle tissues rich in slow-twitch type I muscle fibers. Changes in the relative abundance of this protein and the alpha (or fast) heavy subunit of cardiac myosin correlate with the contractile velocity of cardiac muscle. Its expression is also altered during thyroid hormone depletion and hemodynamic overloading. Mutations in this gene are associated with familial hypertrophic cardiomyopathy, myosin storage myopathy, dilated cardiomyopathy, and Laing early-onset distal myopathy.
PRKCZ 5590 protein kinase C zeta ENSG00000067606 Protein kinase C (PKC) zeta is a member of the PKC family of serine/threonine kinases which are involved in a variety of cellular processes such as proliferation, differentiation and secretion. Unlike the classical PKC isoenzymes which are calcium-dependent, PKC zeta exhibits a kinase activity which is independent of calcium and diacylglycerol but not of phosphatidylserine. Furthermore, it is insensitive to typical PKC inhibitors and cannot be activated by phorbol ester. Unlike the classical PKC isoenzymes, it has only a single zinc finger module. These structural and biochemical properties indicate that the zeta subspecies is related to, but distinct from other isoenzymes of PKC. Alternative splicing results in multiple transcript variants encoding different isoforms.
MB 4151 myoglobin ENSG00000198125 This gene encodes a member of the globin superfamily and is expressed in skeletal and cardiac muscles. The encoded protein is a haemoprotein contributing to intracellular oxygen storage and transcellular facilitated diffusion of oxygen. At least three alternatively spliced transcript variants encoding the same protein have been reported.
FNBP1P1 ENSG00000257800 formin binding protein 1 pseudogene 1 ENSG00000257800 NA
FBXL16 146330 F-box and leucine rich repeat protein 16 ENSG00000127585 Members of the F-box protein family, such as FBXL16, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]).
SLC7A11 23657 solute carrier family 7 member 11 ENSG00000151012 This gene encodes a member of a heteromeric, sodium-independent, anionic amino acid transport system that is highly specific for cysteine and glutamate. In this system, designated Xc(-), the anionic form of cysteine is transported in exchange for glutamate. This protein has been identified as the predominant mediator of Kaposi sarcoma-associated herpesvirus fusion and entry permissiveness into cells. Also, increased expression of this gene in primary gliomas (compared to normal brain tissue) was associated with increased glutamate secretion via the XCT channels, resulting in neuronal cell death.
LY6G6C 80740 lymphocyte antigen 6 complex, locus G6C ENSG00000204421 LY6G6C belongs to a cluster of leukocyte antigen-6 (LY6) genes located in the major histocompatibility complex (MHC) class III region on chromosome 6. Members of the LY6 superfamily typically contain 70 to 80 amino acids, including 8 to 10 cysteines. Most LY6 proteins are attached to the cell surface by a glycosylphosphatidylinositol (GPI) anchor that is directly involved in signal transduction (Mallya et al., 2002 [PubMed 12079290]).
ALDH1A3 220 aldehyde dehydrogenase 1 family member A3 ENSG00000184254 This gene encodes an aldehyde dehydrogenase enzyme that uses retinal as a substrate. Mutations in this gene have been associated with microphthalmia, isolated 8, and expression changes have also been detected in tumor cells. Alternative splicing results in multiple transcript variants.
SLC2A1 6513 solute carrier family 2 member 1 ENSG00000117394 This gene encodes a major glucose transporter in the mammalian blood-brain barrier. The encoded protein is found primarily in the cell membrane and on the cell surface, where it can also function as a receptor for human T-cell leukemia virus (HTLV) I and II. Mutations in this gene have been found in a family with paroxysmal exertion-induced dyskinesia.
PLA2G2A 5320 phospholipase A2 group IIA ENSG00000188257 The protein encoded by this gene is a member of the phospholipase A2 family (PLA2). PLA2s constitute a diverse family of enzymes with respect to sequence, function, localization, and divalent cation requirements. This gene product belongs to group II, which contains secreted form of PLA2, an extracellular enzyme that has a low molecular mass and requires calcium ions for catalysis. It catalyzes the hydrolysis of the sn-2 fatty acid acyl ester bond of phosphoglycerides, releasing free fatty acids and lysophospholipids, and thought to participate in the regulation of the phospholipid metabolism in biomembranes. Several alternatively spliced transcript variants with different 5’ UTRs have been found for this gene.
FAM198A 729085 family with sequence similarity 198 member A ENSG00000144649 NA
MT1A 4489 metallothionein 1A ENSG00000205362 NA
MYL2 4633 myosin light chain 2 ENSG00000111245 Thus gene encodes the regulatory light chain associated with cardiac myosin beta (or slow) heavy chain. Ca+ triggers the phosphorylation of regulatory light chain that in turn triggers contraction. Mutations in this gene are associated with mid-left ventricular chamber type hypertrophic cardiomyopathy.
CLIC3 9022 chloride intracellular channel 3 ENSG00000169583 Chloride channels are a diverse group of proteins that regulate fundamental cellular processes including stabilization of cell membrane potential, transepithelial transport, maintenance of intracellular pH, and regulation of cell volume. Chloride intracellular channel 3 is a member of the p64 family and is predominantly localized in the nucleus and stimulates chloride ion channel activity. In addition, this protein may participate in cellular growth control, based on its association with ERK7, a member of the MAP kinase family.
TNFRSF19 55504 tumor necrosis factor receptor superfamily member 19 ENSG00000127863 The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptor is highly expressed during embryonic development. It has been shown to interact with TRAF family members, and to activate JNK signaling pathway when overexpressed in cells. This receptor is capable of inducing apoptosis by a caspase-independent mechanism, and it is thought to play an essential role in embryonic development. Alternatively spliced transcript variants encoding distinct isoforms have been described.
CDO1 1036 cysteine dioxygenase type 1 ENSG00000129596 NA
SFN 2810 stratifin ENSG00000175793 NA
HSPB6 126393 heat shock protein family B (small) member 6 ENSG00000004776 This locus encodes a heat shock protein. The encoded protein likely plays a role in smooth muscle relaxation.
RAB11FIP4 84440 RAB11 family interacting protein 4 ENSG00000131242 Proteins of the large Rab GTPase family (see RAB1A; MIM 179508) have regulatory roles in the formation, targeting, and fusion of intracellular transport vesicles. RAB11FIP4 is one of many proteins that interact with and regulate Rab GTPases (Hales et al., 2001 [PubMed 11495908]).
VWA7 80737 von Willebrand factor A domain containing 7 ENSG00000204396 NA
OSBPL3 26031 oxysterol binding protein like 3 ENSG00000070882 This gene encodes a member of the oxysterol-binding protein (OSBP) family, a group of intracellular lipid receptors. Most members contain an N-terminal pleckstrin homology domain and a highly conserved C-terminal OSBP-like sterol-binding domain. The encoded protein is involved in the regulation of cell adhesion and organization of the actin cytoskeleton. Alternative splicing results in multiple transcript variants.
LGALS7B 653499 galectin 7B ENSG00000178934 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. Differential and in situ hybridization studies indicate that this lectin is specifically expressed in keratinocytes and found mainly in stratified squamous epithelium. A duplicate copy of this gene (GeneID:3963) is found adjacent to, but on the opposite strand on chromosome 19.
AC002398.12 ENSG00000267328 NA ENSG00000267328 NA
PKDCC 91461 protein kinase domain containing, cytoplasmic ENSG00000162878 NA
GSTO2 119391 glutathione S-transferase omega 2 ENSG00000065621 The protein encoded by this gene is an omega class glutathione S-transferase (GST). GSTs are involved in the metabolism of xenobiotics and carcinogens. Four transcript variants encoding different isoforms have been found for this gene.
AC132217.4 ENSG00000240801 NA ENSG00000240801 NA
TNS2 23371 tensin 2 ENSG00000111077 The protein encoded by this gene belongs to the tensin family. Tensin is a focal adhesion molecule that binds to actin filaments and participates in signaling pathways. This protein plays a role in regulating cell migration. Alternative splicing occurs at this locus and three transcript variants encoding three distinct isoforms have been identified.
STON1 11037 stonin 1 ENSG00000243244 Endocytosis of cell surface proteins is mediated by a complex molecular machinery that assembles on the inner surface of the plasma membrane. This gene encodes one of two human homologs of the Drosophila melanogaster stoned B protein. This protein is related to components of the endocytic machinery and exhibits a modular structure consisting of an N-terminal proline-rich domain, a central region of homology specific to the human stoned B-like proteins, and a C-terminal region homologous to the mu subunits of adaptor protein (AP) complexes. Read-through transcription of this gene into the neighboring downstream gene, which encodes TFIIA-alpha/beta-like factor, generates a transcript (SALF), which encodes a fusion protein comprised of sequence sharing identity with each individual gene product. Alternative splicing results in multiple transcript variants.
RARRES2 5919 retinoic acid receptor responder 2 ENSG00000106538 This gene encodes a secreted chemotactic protein that initiates chemotaxis via the ChemR23 G protein-coupled seven-transmembrane domain ligand. Expression of this gene is upregulated by the synthetic retinoid tazarotene and occurs in a wide variety of tissues. The active protein has several roles, including that as an adipokine and as an antimicrobial protein with activity against bacteria and fungi.
INPP5J 27124 inositol polyphosphate-5-phosphatase J ENSG00000185133 NA
CTC-550B14.7 ENSG00000267265 NA ENSG00000267265 NA
EPCAM 4072 epithelial cell adhesion molecule ENSG00000119888 This gene encodes a carcinoma-associated antigen and is a member of a family that includes at least two type I membrane proteins. This antigen is expressed on most normal epithelial cells and gastrointestinal carcinomas and functions as a homotypic calcium-independent cell adhesion molecule. The antigen is being used as a target for immunotherapy treatment of human carcinomas. Mutations in this gene result in congenital tufting enteropathy.
AF127936.9 ENSG00000235609 NA ENSG00000235609 NA
AF127577.10 ENSG00000229047 NA ENSG00000229047 NA
ZG16B 124220 zymogen granule protein 16B ENSG00000162078 NA
STEAP1 26872 six transmembrane epithelial antigen of the prostate 1 ENSG00000164647 This gene is predominantly expressed in prostate tissue, and is found to be upregulated in multiple cancer cell lines. The gene product is predicted to be a six-transmembrane protein, and was shown to be a cell surface antigen significantly expressed at cell-cell junctions.
PTH1R 5745 parathyroid hormone 1 receptor ENSG00000160801 The protein encoded by this gene is a member of the G-protein coupled receptor family 2. This protein is a receptor for parathyroid hormone (PTH) and for parathyroid hormone-like hormone (PTHLH). The activity of this receptor is mediated by G proteins which activate adenylyl cyclase and also a phosphatidylinositol-calcium second messenger system. Defects in this receptor are known to be the cause of Jansen’s metaphyseal chondrodysplasia (JMC), chondrodysplasia Blomstrand type (BOCD), as well as enchodromatosis. Two transcript variants encoding the same protein have been found for this gene.
MYH6 4624 myosin, heavy chain 6, cardiac muscle, alpha ENSG00000197616 Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light chain subunits, and two regulatory subunits. This gene encodes the alpha heavy chain subunit of cardiac myosin. The gene is located 4kb downstream of the gene encoding the beta heavy chain subunit of cardiac myosin. Mutations in this gene cause familial hypertrophic cardiomyopathy and atrial septal defect 3.
MAPKAPK3 7867 mitogen-activated protein kinase-activated protein kinase 3 ENSG00000114738 This gene encodes a member of the Ser/Thr protein kinase family. This kinase functions as a mitogen-activated protein kinase (MAP kinase)- activated protein kinase. MAP kinases are also known as extracellular signal-regulated kinases (ERKs), act as an integration point for multiple biochemical signals. This kinase was shown to be activated by growth inducers and stress stimulation of cells. In vitro studies demonstrated that ERK, p38 MAP kinase and Jun N-terminal kinase were all able to phosphorylate and activate this kinase, which suggested the role of this kinase as an integrative element of signaling in both mitogen and stress responses. This kinase was reported to interact with, phosphorylate and repress the activity of E47, which is a basic helix-loop-helix transcription factor known to be involved in the regulation of tissue-specific gene expression and cell differentiation. Alternate splicing results in multiple transcript variants that encode the same protein.
BMP1 649 bone morphogenetic protein 1 ENSG00000168487 This gene encodes a protein that is capable of inducing formation of cartilage in vivo. Although other bone morphogenetic proteins are members of the TGF-beta superfamily, this gene encodes a protein that is not closely related to other known growth factors. This gene is expressed as alternatively spliced variants that share an N-terminal protease domain but differ in their C-terminal region.
CSRP3 8048 cysteine and glycine rich protein 3 ENSG00000129170 This gene encodes a member of the CSRP family of LIM domain proteins, which may be involved in regulatory processes important for development and cellular differentiation. The LIM/double zinc-finger motif found in this protein is found in a group of proteins with critical functions in gene regulation, cell growth, and somatic differentiation. Mutations in this gene are thought to cause heritable forms of hypertrophic cardiomyopathy (HCM) and dilated cardiomyopathy (DCM) in humans. Alternatively spliced transcript variants with different 5’ UTR, but encoding the same protein, have been found for this gene.
RP4-564F22.5 ENSG00000224635 NA ENSG00000224635 NA
ANGPTL8 55908 angiopoietin like 8 ENSG00000130173 NA
TG 7038 thyroglobulin ENSG00000042832 Thyroglobulin (Tg) is a glycoprotein homodimer produced predominantly by the thryroid gland. It acts as a substrate for the synthesis of thyroxine and triiodothyronine as well as the storage of the inactive forms of thyroid hormone and iodine. Thyroglobulin is secreted from the endoplasmic reticulum to its site of iodination, and subsequent thyroxine biosynthesis, in the follicular lumen. Mutations in this gene cause thyroid dyshormonogenesis, manifested as goiter, and are associated with moderate to severe congenital hypothyroidism. Polymorphisms in this gene are associated with susceptibility to autoimmune thyroid diseases (AITD) such as Graves disease and Hashimoto thryoiditis.
ACTA1 58 actin, alpha 1, skeletal muscle ENSG00000143632 The product encoded by this gene belongs to the actin family of proteins, which are highly conserved proteins that play a role in cell motility, structure and integrity. Alpha, beta and gamma actin isoforms have been identified, with alpha actins being a major constituent of the contractile apparatus, while beta and gamma actins are involved in the regulation of cell motility. This actin is an alpha actin that is found in skeletal muscle. Mutations in this gene cause nemaline myopathy type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects.
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",13,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 14 Annotations

out <- mygene::queryMany(gene_list[14,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id query name summary notfound
TNFRSF11B 4982 ENSG00000164761 tumor necrosis factor receptor superfamily member 11b The protein encoded by this gene is a member of the TNF-receptor superfamily. This protein is an osteoblast-secreted decoy receptor that functions as a negative regulator of bone resorption. This protein specifically binds to its ligand, osteoprotegerin ligand, both of which are key extracellular regulators of osteoclast development. Studies of the mouse counterpart also suggest that this protein and its ligand play a role in lymph-node organogenesis and vascular calcification. Alternatively spliced transcript variants of this gene have been reported, but their full length nature has not been determined. NA
NA NA ENSG00000117289 NA NA TRUE
TNFSF10 8743 ENSG00000121858 tumor necrosis factor superfamily member 10 The protein encoded by this gene is a cytokine that belongs to the tumor necrosis factor (TNF) ligand family. This protein preferentially induces apoptosis in transformed and tumor cells, but does not appear to kill normal cells although it is expressed at a significant level in most normal tissues. This protein binds to several members of TNF receptor superfamily including TNFRSF10A/TRAILR1, TNFRSF10B/TRAILR2, TNFRSF10C/TRAILR3, TNFRSF10D/TRAILR4, and possibly also to TNFRSF11B/OPG. The activity of this protein may be modulated by binding to the decoy receptors TNFRSF10C/TRAILR3, TNFRSF10D/TRAILR4, and TNFRSF11B/OPG that cannot induce apoptosis. The binding of this protein to its receptors has been shown to trigger the activation of MAPK8/JNK, caspase 8, and caspase 3. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
SLC2A4 6517 ENSG00000181856 solute carrier family 2 member 4 This gene is a member of the solute carrier family 2 (facilitated glucose transporter) family and encodes a protein that functions as an insulin-regulated facilitative glucose transporter. In the absence of insulin, this integral membrane protein is sequestered within the cells of muscle and adipose tissue. Within minutes of insulin stimulation, the protein moves to the cell surface and begins to transport glucose across the cell membrane. Mutations in this gene have been associated with noninsulin-dependent diabetes mellitus (NIDDM). NA
HSPA7 ENSG00000225217 ENSG00000225217 heat shock protein family A (Hsp70) member 7 NA NA
GPR176 11245 ENSG00000166073 G protein-coupled receptor 176 Members of the G protein-coupled receptor family, such as GPR176, are cell surface receptors involved in responses to hormones, growth factors, and neurotransmitters (Hata et al., 1995 [PubMed 7893747]). NA
SNCA 6622 ENSG00000145335 synuclein alpha Alpha-synuclein is a member of the synuclein family, which also includes beta- and gamma-synuclein. Synucleins are abundantly expressed in the brain and alpha- and beta-synuclein inhibit phospholipase D2 selectively. SNCA may serve to integrate presynaptic signaling and membrane trafficking. Defects in SNCA have been implicated in the pathogenesis of Parkinson disease. SNCA peptides are a major component of amyloid plaques in the brains of patients with Alzheimer’s disease. Four alternatively spliced transcripts encoding two different isoforms have been identified for this gene. NA
P2RY1 5028 ENSG00000169860 purinergic receptor P2Y1 The product of this gene belongs to the family of G-protein coupled receptors. This family has several receptor subtypes with different pharmacological selectivity, which overlaps in some cases, for various adenosine and uridine nucleotides. This receptor functions as a receptor for extracellular ATP and ADP. In platelets binding to ADP leads to mobilization of intracellular calcium ions via activation of phospholipase C, a change in platelet shape, and probably to platelet aggregation. NA
ST6GALNAC2 10610 ENSG00000070731 ST6 N-acetylgalactosaminide alpha-2,6-sialyltransferase 2 ST6GALNAC2 belongs to a family of sialyltransferases that add sialic acids to the nonreducing ends of glycoconjugates. At the cell surface, these modifications have roles in cell-cell and cell-substrate interactions, bacterial adhesion, and protein targeting (Samyn-Petit et al., 2000 [PubMed 10742600]). NA
CST6 1474 ENSG00000175315 cystatin E/M The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and the kininogens. The type 2 cystatin proteins are a class of cysteine proteinase inhibitors found in a variety of human fluids and secretions, where they appear to provide protective functions. This gene encodes a cystatin from the type 2 family, which is down-regulated in metastatic breast tumor cells as compared to primary tumor cells. Loss of expression is likely associated with the progression of a primary tumor to a metastatic phenotype. NA
MTHFD1L 25902 ENSG00000120254 methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 1-like The protein encoded by this gene is involved in the synthesis of tetrahydrofolate (THF) in the mitochondrion. THF is important in the de novo synthesis of purines and thymidylate and in the regeneration of methionine from homocysteine. Several transcript variants encoding different isoforms have been found for this gene. NA
SERPINE1 5054 ENSG00000106366 serpin family E member 1 This gene encodes a member of the serine proteinase inhibitor (serpin) superfamily. This member is the principal inhibitor of tissue plasminogen activator (tPA) and urokinase (uPA), and hence is an inhibitor of fibrinolysis. Defects in this gene are the cause of plasminogen activator inhibitor-1 deficiency (PAI-1 deficiency), and high concentrations of the gene product are associated with thrombophilia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
RP11-389C8.2 ENSG00000261269 ENSG00000261269 NA NA NA
ZNF215 7762 ENSG00000149054 zinc finger protein 215 NA NA
H19 283120 ENSG00000130600 H19, imprinted maternally expressed transcript (non-protein coding) This gene is located in an imprinted region of chromosome 11 near the insulin-like growth factor 2 (IGF2) gene. This gene is only expressed from the maternally-inherited chromosome, whereas IGF2 is only expressed from the paternally-inherited chromosome. The product of this gene is a long non-coding RNA which functions as a tumor suppressor. Mutations in this gene have been associated with Beckwith-Wiedemann Syndrome and Wilms tumorigenesis. Alternative splicing results in multiple transcript variants. NA
PLIN5 440503 ENSG00000214456 perilipin 5 Members of the perilipin family, such as PLIN5, coat intracellular lipid storage droplets and protect them from lipolytic degradation (Dalen et al., 2007 [PubMed 17234449]). NA
ADRB2 154 ENSG00000169252 adrenoceptor beta 2 This gene encodes beta-2-adrenergic receptor which is a member of the G protein-coupled receptor superfamily. This receptor is directly associated with one of its ultimate effectors, the class C L-type calcium channel Ca(V)1.2. This receptor-channel complex also contains a G protein, an adenylyl cyclase, cAMP-dependent kinase, and the counterbalancing phosphatase, PP2A. The assembly of the signaling complex provides a mechanism that ensures specific and rapid signaling by this G protein-coupled receptor. This gene is intronless. Different polymorphic forms, point mutations, and/or downregulation of this gene are associated with nocturnal asthma, obesity and type 2 diabetes. NA
RBP7 116362 ENSG00000162444 retinol binding protein 7 Due to its chemical instability and low solubility in aqueous solution, vitamin A requires cellular retinol-binding proteins (CRBPs), such as RBP7, for stability, internalization, intercellular transfer, homeostasis, and metabolism. NA
CARD14 79092 ENSG00000141527 caspase recruitment domain family member 14 This gene encodes a caspase recruitment domain-containing protein that is a member of the membrane-associated guanylate kinase (MAGUK) family of proteins. Members of this protein family are scaffold proteins that are involved in a diverse array of cellular processes including cellular adhesion, signal transduction and cell polarity control. This protein has been shown to specifically interact with BCL10, a protein known to function as a positive regulator of cell apoptosis and NF-kappaB activation. Alternate splicing results in multiple transcript variants. NA
HCP5 10866 ENSG00000206337 HLA complex P5 (non-protein coding) NA NA
SYNGR1 9145 ENSG00000100321 synaptogyrin 1 This gene encodes an integral membrane protein associated with presynaptic vesicles in neuronal cells. The exact function of this protein is unclear, but studies of a similar murine protein suggest that it functions in synaptic plasticity without being required for synaptic transmission. The gene product belongs to the synaptogyrin gene family. Three alternatively spliced variants encoding three different isoforms have been identified. NA
NDRG2 57447 ENSG00000165795 NDRG family member 2 This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that may play a role in neurite outgrowth. This gene may be involved in glioblastoma carcinogenesis. Several alternatively spliced transcript variants of this gene have been described, but the full-length nature of some of these variants has not been determined. NA
ITGA10 8515 ENSG00000143127 integrin subunit alpha 10 Integrins are integral transmembrane glycoproteins composed of noncovalently linked alpha and beta chains. They participate in cell adhesion as well as cell-surface mediated signalling. This gene encodes an integrin alpha chain and is expressed at high levels in chondrocytes, where it is transcriptionally regulated by AP-2epsilon and Ets-1. The protein encoded by this gene binds to collagen. Alternative splicing results in multiple transcript variants. NA
GIMAP5 55340 ENSG00000196329 GTPase, IMAP family member 5 This gene encodes a protein belonging to the GTP-binding superfamily and to the immuno-associated nucleotide (IAN) subfamily of nucleotide-binding proteins. In humans, the IAN subfamily genes are located in a cluster at 7q36.1. This gene encodes an antiapoptotic protein that functions in T-cell survival. Polymorphisms in this gene are associated with systemic lupus erythematosus. Read-through transcription exists between this gene and the neighboring upstream GIMAP1 (GTPase, IMAP family member 1) gene. NA
CTGF 1490 ENSG00000118523 connective tissue growth factor The protein encoded by this gene is a mitogen that is secreted by vascular endothelial cells. The encoded protein plays a role in chondrocyte proliferation and differentiation, cell adhesion in many cell types, and is related to platelet-derived growth factor. Certain polymorphisms in this gene have been linked with a higher incidence of systemic sclerosis. NA
NA NA ENSG00000241732 NA NA TRUE
ARHGAP25 9938 ENSG00000163219 Rho GTPase activating protein 25 ARHGAPs, such as ARHGAP25, encode negative regulators of Rho GTPases (see ARHA; MIM 165390), which are implicated in actin remodeling, cell polarity, and cell migration (Katoh and Katoh, 2004 [PubMed 15254788]). NA
ARHGEF4 50649 ENSG00000136002 Rho guanine nucleotide exchange factor 4 Rho GTPases play a fundamental role in numerous cellular processes that are initiated by extracellular stimuli that work through G protein coupled receptors. The protein encoded by this gene may form complex with G proteins and stimulate Rho-dependent signals. Multiple alternatively spliced transcript variants encoding different isoforms have been found, but the full-length nature of some variants has not been determined. NA
RP11-315I20.3 ENSG00000244619 ENSG00000244619 NA NA NA
TRIM63 84676 ENSG00000158022 tripartite motif containing 63 This gene encodes a member of the RING zinc finger protein family found in striated muscle and iris. The product of this gene is an E3 ubiquitin ligase that localizes to the Z-line and M-line lattices of myofibrils. This protein plays an important role in the atrophy of skeletal and cardiac muscle and is required for the degradation of myosin heavy chain proteins, myosin light chain, myosin binding protein, and for muscle-type creatine kinase. NA
ABCG1 9619 ENSG00000160179 ATP binding cassette subfamily G member 1 The protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the White subfamily. It is involved in macrophage cholesterol and phospholipids transport, and may regulate cellular lipid homeostasis in other cell types. Six alternative splice variants have been identified. NA
SPRR2E 6704 ENSG00000203785 small proline rich protein 2E This gene encodes a member of a family of small proline-rich proteins clustered in the epidermal differentiation complex on chromosome 1q21. The encoded protein, along with other family members, is a component of the cornified cell envelope that forms beneath the plasma membrane in terminally differentiated stratified squamous epithelia. This envelope serves as a barrier against extracellular and environmental factors. The seven SPRR2 genes (A-G) appear to have been homogenized by gene conversion compared to others in the cluster that exhibit greater differences in protein structure. NA
TPH1 7166 ENSG00000129167 tryptophan hydroxylase 1 This gene encodes a member of the aromatic amino acid hydroxylase family. The encoded protein catalyzes the first and rate limiting step in the biosynthesis of serotonin, an important hormone and neurotransmitter. Mutations in this gene have been associated with an elevated risk for a variety of diseases and disorders, including schizophrenia, somatic anxiety, anger-related traits, bipolar disorder, suicidal behavior, addictions, and others. NA
RAI14 26064 ENSG00000039560 retinoic acid induced 14 NA NA
RHCG 51458 ENSG00000140519 Rh family C glycoprotein NA NA
BCAR1 9564 ENSG00000050820 BCAR1, Cas family scaffolding protein BCAR1, or CAS, is an Src (MIM 190090) family kinase substrate involved in various cellular events, including migration, survival, transformation, and invasion (Sawada et al., 2006 [PubMed 17129785]). NA
NMNAT3 349565 ENSG00000163864 nicotinamide nucleotide adenylyltransferase 3 This gene encodes a member of the nicotinamide/nicotinic acid mononucleotide adenylyltransferase family. These enzymes use ATP to catalyze the synthesis of nicotinamide adenine dinucleotide or nicotinic acid adenine dinucleotide from nicotinamide mononucleotide or nicotinic acid mononucleotide, respectively. The encoded protein is localized to mitochondria and may also play a neuroprotective role as a molecular chaperone. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
CELSR3 1951 ENSG00000008300 cadherin EGF LAG seven-pass G-type receptor 3 This gene belongs to the flamingo subfamily, which is included in the cadherin superfamily. The flamingo cadherins consist of nonclassic-type cadherins that do not interact with catenins. They are plasma membrane proteins containing seven epidermal growth factor-like repeats, nine cadherin domains and two laminin A G-type repeats in their ectodomain. They also have seven transmembrane domains, a characteristic feature of their subfamily. The encoded protein may be involved in the regulation of contact-dependent neurite growth and may play a role in tumor formation. NA
ABLIM1 3983 ENSG00000099204 actin binding LIM protein 1 This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. NA
SOX15 6665 ENSG00000129194 SRY-box 15 This gene encodes a member of the SOX (SRY-related HMG-box) family of transcription factors involved in the regulation of embryonic development and in the determination of the cell fate. The encoded protein may act as a transcriptional regulator after forming a protein complex with other proteins. NA
SIPA1L2 57568 ENSG00000116991 signal induced proliferation associated 1 like 2 This gene encodes a member of the signal-induced proliferation-associated 1 like family. Members of this family contain a GTPase activating domain, a PDZ domain and a C-terminal coiled-coil domain with a leucine zipper. A similar protein in rat acts as a GTPases for the small GTPase Rap. NA
MYH10 4628 ENSG00000133026 myosin, heavy chain 10, non-muscle This gene encodes a member of the myosin superfamily. The protein represents a conventional non-muscle myosin; it should not be confused with the unconventional myosin-10 (MYO10). Myosins are actin-dependent motor proteins with diverse functions including regulation of cytokinesis, cell motility, and cell polarity. Mutations in this gene have been associated with May-Hegglin anomaly and developmental defects in brain and heart. Multiple transcript variants encoding different isoforms have been found for this gene. NA
VASN 114990 ENSG00000168140 vasorin NA NA
FXYD6 53826 ENSG00000137726 FXYD domain containing ion transport regulator 6 This gene encodes a member of the FXYD family of transmembrane proteins. This particular protein encodes phosphohippolin, which likely affects the activity of Na,K-ATPase. Multiple alternatively spliced transcript variants encoding the same protein have been described. Related pseudogenes have been identified on chromosomes 10 and X. Read-through transcripts have been observed between this locus and the downstream sodium/potassium-transporting ATPase subunit gamma (FXYD2, GeneID 486) locus. NA
AC084809.2 ENSG00000226377 ENSG00000226377 NA NA NA
CNFN 84518 ENSG00000105427 cornifelin NA NA
LRRN2 10446 ENSG00000170382 leucine rich repeat neuronal 2 The protein encoded by this gene belongs to the leucine-rich repeat superfamily. This gene was found to be amplified and overexpressed in malignant gliomas. The encoded protein has homology with other proteins that function as cell-adhesion molecules or as signal transduction receptors and is a candidate for the target gene in the 1q32.1 amplicon in malignant gliomas. Two alternatively spliced transcript variants encoding the same protein have been described for this gene. NA
CEACAM1 634 ENSG00000079385 carcinoembryonic antigen related cell adhesion molecule 1 This gene encodes a member of the carcinoembryonic antigen (CEA) gene family, which belongs to the immunoglobulin superfamily. Two subgroups of the CEA family, the CEA cell adhesion molecules and the pregnancy-specific glycoproteins, are located within a 1.2 Mb cluster on the long arm of chromosome 19. Eleven pseudogenes of the CEA cell adhesion molecule subgroup are also found in the cluster. The encoded protein was originally described in bile ducts of liver as biliary glycoprotein. Subsequently, it was found to be a cell-cell adhesion molecule detected on leukocytes, epithelia, and endothelia. The encoded protein mediates cell adhesion via homophilic as well as heterophilic binding to other proteins of the subgroup. Multiple cellular activities have been attributed to the encoded protein, including roles in the differentiation and arrangement of tissue three-dimensional structure, angiogenesis, apoptosis, tumor suppression, metastasis, and the modulation of innate and adaptive immune responses. Multiple transcript variants encoding different isoforms have been reported, but the full-length nature of all variants has not been defined. NA
TMCC3 57458 ENSG00000057704 transmembrane and coiled-coil domain family 3 NA NA
FCGR3B 2215 ENSG00000162747 Fc fragment of IgG receptor IIIb The protein encoded by this gene is a low affinity receptor for the Fc region of gamma immunoglobulins (IgG). The encoded protein acts as a monomer and can bind either monomeric or aggregated IgG. This gene may function to capture immune complexes in the peripheral circulation. Several transcript variants encoding different isoforms have been found for this gene. A highly-similar gene encoding a related protein is also found on chromosome 1. NA
SH2D3C 10044 ENSG00000095370 SH2 domain containing 3C This gene encodes an adaptor protein and member of a cytoplasmic protein family involved in cell migration. The encoded protein contains a putative Src homology 2 (SH2) domain and guanine nucleotide exchange factor-like domain which allows this signaling protein to form a complex with scaffolding protein Crk-associated substrate. Multiple transcript variants encoding different isoforms have been found for this gene. NA
N4BP3 23138 ENSG00000145911 NEDD4 binding protein 3 NA NA
CD34 947 ENSG00000174059 CD34 molecule The protein encoded by this gene may play a role in the attachment of stem cells to the bone marrow extracellular matrix or to stromal cells. This single-pass membrane protein is highly glycosylated and phosphorylated by protein kinase C. Two transcript variants encoding different isoforms have been found for this gene. NA
RNF125 54941 ENSG00000101695 ring finger protein 125 This gene encodes a novel E3 ubiquitin ligase that contains a RING finger domain in the N-terminus and three zinc-binding and one ubiquitin-interacting motif in the C-terminus. As a result of myristoylation, this protein associates with membranes and is primarily localized to intracellular membrane systems. The encoded protein may function as a positive regulator in the T-cell receptor signaling pathway. NA
TGM3 7053 ENSG00000125780 transglutaminase 3 Transglutaminases are enzymes that catalyze the crosslinking of proteins by epsilon-gamma glutamyl lysine isopeptide bonds. While the primary structure of transglutaminases is not conserved, they all have the same amino acid sequence at their active sites and their activity is calcium-dependent. The protein encoded by this gene consists of two polypeptide chains activated from a single precursor protein by proteolysis. The encoded protein is involved the later stages of cell envelope formation in the epidermis and hair follicle. NA
NEURL1B 54492 ENSG00000214357 neuralized E3 ubiquitin protein ligase 1B NA NA
RP11-688G15.3 ENSG00000258749 ENSG00000258749 NA NA NA
HIGD1B 51751 ENSG00000131097 HIG1 hypoxia inducible domain family member 1B This gene encodes a member of the hypoxia inducible gene 1 (HIG1) domain family. The encoded protein is localized to the cell membrane and has been linked to tumorigenesis and the progression of pituitary adenomas. Alternative splicing results in multiple transcript variants. NA
FGF11 2256 ENSG00000161958 fibroblast growth factor 11 The protein encoded by this gene is a member of the fibroblast growth factor (FGF) family. FGF family members possess broad mitogenic and cell survival activities, and are involved in a variety of biological processes, including embryonic development, cell growth, morphogenesis, tissue repair, tumor growth and invasion. The function of this gene has not yet been determined. The expression pattern of the mouse homolog implies a role in nervous system development. Alternative splicing results in multiple transcript variants. NA
RP11-673E1.3 ENSG00000249741 ENSG00000249741 NA NA NA
DOCK8 81704 ENSG00000107099 dedicator of cytokinesis 8 This gene encodes a member of the DOCK180 family of guanine nucleotide exchange factors. Guanine nucleotide exchange factors interact with Rho GTPases and are components of intracellular signaling networks. Mutations in this gene result in the autosomal recessive form of the hyper-IgE syndrome. Alternatively spliced transcript variants encoding different isoforms have been described. NA
PLBD1 79887 ENSG00000121316 phospholipase B domain containing 1 NA NA
S100A9 6280 ENSG00000163220 S100 calcium binding protein A9 The protein encoded by this gene is a member of the S100 family of proteins containing 2 EF-hand calcium-binding motifs. S100 proteins are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the regulation of a number of cellular processes such as cell cycle progression and differentiation. S100 genes include at least 13 members which are located as a cluster on chromosome 1q21. This protein may function in the inhibition of casein kinase and altered expression of this protein is associated with the disease cystic fibrosis. This antimicrobial protein exhibits antifungal and antibacterial activity. NA
SMIM5 643008 ENSG00000204323 small integral membrane protein 5 NA NA
CTHRC1 115908 ENSG00000164932 collagen triple helix repeat containing 1 This locus encodes a protein that may play a role in the cellular response to arterial injury through involvement in vascular remodeling. Mutations at this locus have been associated with Barrett esophagus and esophageal adenocarcinoma. Alternatively spliced transcript variants have been described. NA
C1QB 713 ENSG00000173369 complement component 1, q subcomponent, B chain This gene encodes a major constituent of the human complement subcomponent C1q. C1q associates with C1r and C1s in order to yield the first component of the serum complement system. Deficiency of C1q has been associated with lupus erythematosus and glomerulonephritis. C1q is composed of 18 polypeptide chains: six A-chains, six B-chains, and six C-chains. Each chain contains a collagen-like region located near the N terminus and a C-terminal globular region. The A-, B-, and C-chains are arranged in the order A-C-B on chromosome 1. This gene encodes the B-chain polypeptide of human complement subcomponent C1q NA
PCP4L1 654790 ENSG00000248485 Purkinje cell protein 4 like 1 NA NA
CSTA 1475 ENSG00000121552 cystatin A The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins, and kininogens. This gene encodes a stefin that functions as a cysteine protease inhibitor, forming tight complexes with papain and the cathepsins B, H, and L. The protein is one of the precursor proteins of cornified cell envelope in keratinocytes and plays a role in epidermal development and maintenance. Stefins have been proposed as prognostic and diagnostic tools for cancer. NA
POPDC2 64091 ENSG00000121577 popeye domain containing 2 This gene encodes a member of the POP family of proteins which contain three putative transmembrane domains. This membrane associated protein is predominantly expressed in skeletal and cardiac muscle, and may have an important function in these tissues. NA
CTD-2201I18.1 101929215 ENSG00000249825 uncharacterized LOC101929215 NA NA
GAS6-AS1 ENSG00000233695 ENSG00000233695 GAS6 antisense RNA 1 NA NA
KLHDC8B 200942 ENSG00000185909 kelch domain containing 8B This gene encodes a protein which forms a distinct beta-propeller protein structure of kelch domains allowing for protein-protein interactions. Mutations in this gene have been associated with Hodgkin lymphoma. NA
NA NA ENSG00000180672 NA NA TRUE
KRT5 3852 ENSG00000186081 keratin 5 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the basal layer of the epidermis with family member KRT14. Mutations in these genes have been associated with a complex of diseases termed epidermolysis bullosa simplex. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. NA
RP11-728F11.4 ENSG00000254528 ENSG00000254528 NA NA NA
SLC1A1 6505 ENSG00000106688 solute carrier family 1 member 1 This gene encodes a member of the high-affinity glutamate transporters that play an essential role in transporting glutamate across plasma membranes. In brain, these transporters are crucial in terminating the postsynaptic action of the neurotransmitter glutamate, and in maintaining extracellular glutamate concentrations below neurotoxic levels. This transporter also transports aspartate, and mutations in this gene are thought to cause dicarboxylicamino aciduria, also known as glutamate-aspartate transport defect. NA
NCF2 4688 ENSG00000116701 neutrophil cytosolic factor 2 This gene encodes neutrophil cytosolic factor 2, the 67-kilodalton cytosolic subunit of the multi-protein NADPH oxidase complex found in neutrophils. This oxidase produces a burst of superoxide which is delivered to the lumen of the neutrophil phagosome. Mutations in this gene, as well as in other NADPH oxidase subunits, can result in chronic granulomatous disease, a disease that causes recurrent infections by catalase-positive organisms. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
TCP11L2 255394 ENSG00000166046 t-complex 11 like 2 NA NA
MYBL1 4603 ENSG00000185697 MYB proto-oncogene like 1 NA NA
THY1 7070 ENSG00000154096 Thy-1 cell surface antigen This gene encodes a cell surface glycoprotein and member of the immunoglobulin superfamily of proteins. The encoded protein is involved in cell adhesion and cell communication in numerous cell types, but particularly in cells of the immune and nervous systems. The encoded protein is widely used as a marker for hematopoietic stem cells. This gene may function as a tumor suppressor in nasopharyngeal carcinoma. Alternative splicing results in multiple transcript variants. NA
CPT1B 1375 ENSG00000205560 carnitine palmitoyltransferase 1B The protein encoded by this gene, a member of the carnitine/choline acetyltransferase family, is the rate-controlling enzyme of the long-chain fatty acid beta-oxidation pathway in muscle mitochondria. This enzyme is required for the net transport of long-chain fatty acyl-CoAs from the cytoplasm into the mitochondria. Multiple transcript variants encoding different isoforms have been found for this gene, and read-through transcripts are expressed from the upstream locus that include exons from this gene. NA
CHL1 10752 ENSG00000134121 cell adhesion molecule L1 like The protein encoded by this gene is a member of the L1 gene family of neural cell adhesion molecules. It is a neural recognition molecule that may be involved in signal transduction pathways. The deletion of one copy of this gene may be responsible for mental defects in patients with 3p- syndrome. This protein may also play a role in the growth of certain cancers. Alternate splicing results in both coding and non-coding variants. NA
RP11-334E6.12 ENSG00000263873 ENSG00000263873 NA NA NA
RP11-350G8.9 ENSG00000273110 ENSG00000273110 NA NA NA
MYO15B ENSG00000266714 ENSG00000266714 myosin XVB NA NA
LMO7 4008 ENSG00000136153 LIM domain 7 This gene encodes a protein containing a calponin homology (CH) domain, a PDZ domain, and a LIM domain, and may be involved in protein-protein interactions. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, however, the full-length nature of some variants is not known. NA
ABI3 51225 ENSG00000108798 ABI family member 3 This gene encodes a member of an adaptor protein family. Members of this family encode proteins containing a homeobox homology domain, proline rich region and Src-homology 3 (SH3) domain, and are components of the Abi/WAVE complex which regulates actin polymerization. The encoded protein inhibits ectopic metastasis of tumor cells as well as cell migration. This may be accomplished through interaction with p21-activated kinase. Alternative splicing results in multiple transcript variants. NA
NCF4 4689 ENSG00000100365 neutrophil cytosolic factor 4 The protein encoded by this gene is a cytosolic regulatory component of the superoxide-producing phagocyte NADPH-oxidase, a multicomponent enzyme system important for host defense. This protein is preferentially expressed in cells of myeloid lineage. It interacts primarily with neutrophil cytosolic factor 2 (NCF2/p67-phox) to form a complex with neutrophil cytosolic factor 1 (NCF1/p47-phox), which further interacts with the small G protein RAC1 and translocates to the membrane upon cell stimulation. This complex then activates flavocytochrome b, the membrane-integrated catalytic core of the enzyme system. The PX domain of this protein can bind phospholipid products of the PI(3) kinase, which suggests its role in PI(3) kinase-mediated signaling events. The phosphorylation of this protein was found to negatively regulate the enzyme activity. Alternatively spliced transcript variants encoding distinct isoforms have been observed. NA
MYO1D 4642 ENSG00000176658 myosin ID NA NA
FN1 2335 ENSG00000115414 fibronectin 1 This gene encodes fibronectin, a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric or multimeric form at the cell surface and in extracellular matrix. The encoded preproprotein is proteolytically processed to generate the mature protein. Fibronectin is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, and metastasis. The gene has three regions subject to alternative splicing, with the potential to produce 20 different transcript variants, at least one of which encodes an isoform that undergoes proteolytic processing. The full-length nature of some variants has not been determined. NA
STAB1 23166 ENSG00000010327 stabilin 1 This gene encodes a large, transmembrane receptor protein which may function in angiogenesis, lymphocyte homing, cell adhesion, or receptor scavenging. The protein contains 7 fasciclin, 16 epidermal growth factor (EGF)-like, and 2 laminin-type EGF-like domains as well as a C-type lectin-like hyaluronan-binding Link module. The protein is primarily expressed on sinusoidal endothelial cells of liver, spleen, and lymph node. The receptor has been shown to endocytose ligands such as low density lipoprotein, Gram-positive and Gram-negative bacteria, and advanced glycosylation end products. Supporting its possible role as a scavenger receptor, the protein rapidly cycles between the plasma membrane and early endosomes. NA
OLR1 4973 ENSG00000173391 oxidized low density lipoprotein receptor 1 This gene encodes a low density lipoprotein receptor that belongs to the C-type lectin superfamily. This gene is regulated through the cyclic AMP signaling pathway. The encoded protein binds, internalizes and degrades oxidized low-density lipoprotein. This protein may be involved in the regulation of Fas-induced apoptosis. This protein may play a role as a scavenger receptor. Mutations of this gene have been associated with atherosclerosis, risk of myocardial infarction, and may modify the risk of Alzheimer’s disease. Alternate splicing results in multiple transcript variants. NA
RP11-169D4.2 ENSG00000256633 ENSG00000256633 NA NA NA
TMEM88 92162 ENSG00000167874 transmembrane protein 88 NA NA
IGFBP2 3485 ENSG00000115457 insulin like growth factor binding protein 2 The protein encoded by this gene is one of six similar proteins that bind insulin-like growth factors I and II (IGF-I and IGF-II). The encoded protein can be secreted into the bloodstream, where it binds IGF-I and IGF-II with high affinity, or it can remain intracellular, interacting with many different ligands. High expression levels of this protein promote the growth of several types of tumors and may be predictive of the chances of recovery of the patient. Several transcript variants, one encoding a secreted isoform and the others encoding nonsecreted isoforms, have been found for this gene. NA
EGLN3 112399 ENSG00000129521 egl-9 family hypoxia inducible factor 3 NA NA
SYNM 23336 ENSG00000182253 synemin The protein encoded by this gene is an intermediate filament (IF) family member. IF proteins are cytoskeletal proteins that confer resistance to mechanical stress and are encoded by a dispersed multigene family. This protein has been found to form a linkage between desmin, which is a subunit of the IF network, and the extracellular matrix, and provides an important structural support in muscle. Two alternatively spliced variants encoding different isoforms have been described for this gene. NA
SLC39A14 23516 ENSG00000104635 solute carrier family 39 member 14 Zinc is an essential cofactor for hundreds of enzymes. It is involved in protein, nucleic acid, carbohydrate, and lipid metabolism, as well as in the control of gene transcription, growth, development, and differentiation. SLC39A14 belongs to a subfamily of proteins that show structural characteristics of zinc transporters (Taylor and Nicholson, 2003 [PubMed 12659941]). NA
ATP1B2 482 ENSG00000129244 ATPase Na+/K+ transporting subunit beta 2 The protein encoded by this gene belongs to the family of Na+/K+ and H+/K+ ATPases beta chain proteins, and to the subfamily of Na+/K+ -ATPases. Na+/K+ -ATPase is an integral membrane protein responsible for establishing and maintaining the electrochemical gradients of Na and K ions across the plasma membrane. These gradients are essential for osmoregulation, for sodium-coupled transport of a variety of organic and inorganic molecules, and for electrical excitability of nerve and muscle. This enzyme is composed of two subunits, a large catalytic subunit (alpha) and a smaller glycoprotein subunit (beta). The beta subunit regulates, through assembly of alpha/beta heterodimers, the number of sodium pumps transported to the plasma membrane. The glycoprotein subunit of Na+/K+ -ATPase is encoded by multiple genes. This gene encodes a beta 2 subunit. Two transcript variants encoding different isoforms have been found for this gene. NA
CTB-79E8.2 ENSG00000253445 ENSG00000253445 NA NA NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",14,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 15 Annotations

out <- mygene::queryMany(gene_list[15,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary X_id symbol name query notfound
The protein encoded by this gene is a member of the CXC chemokine family. This chemokine is one of the major mediators of the inflammatory response. This chemokine is secreted by several cell types. It functions as a chemoattractant, and is also a potent angiogenic factor. This gene is believed to play a role in the pathogenesis of bronchiolitis, a common respiratory tract disease caused by viral infection. This gene and other ten members of the CXC chemokine gene family form a chemokine gene cluster in a region mapped to chromosome 4q. 3576 CXCL8 C-X-C motif chemokine ligand 8 ENSG00000169429 NA
The protein encoded by this gene coats lipid storage droplets in adipocytes, thereby protecting them until they can be broken down by hormone-sensitive lipase. The encoded protein is the major cAMP-dependent protein kinase substrate in adipocytes and, when unphosphorylated, may play a role in the inhibition of lipolysis. Alternatively spliced transcript variants varying in the 5’ UTR, but encoding the same protein, have been found for this gene. 5346 PLIN1 perilipin 1 ENSG00000166819 NA
Members of the F-box protein family, such as FBXO27, are characterized by an approximately 40-amino acid F-box motif. SCF complexes, formed by SKP1 (MIM 601434), cullin (see CUL1; MIM 603134), and F-box proteins, act as protein-ubiquitin ligases. F-box proteins interact with SKP1 through the F box, and they interact with ubiquitination targets through other protein interaction domains (Jin et al., 2004 [PubMed 15520277]). 126433 FBXO27 F-box protein 27 ENSG00000161243 NA
NA 81691 LOC81691 exonuclease NEF-sp ENSG00000005189 NA
This gene encodes a low density lipoprotein receptor that belongs to the C-type lectin superfamily. This gene is regulated through the cyclic AMP signaling pathway. The encoded protein binds, internalizes and degrades oxidized low-density lipoprotein. This protein may be involved in the regulation of Fas-induced apoptosis. This protein may play a role as a scavenger receptor. Mutations of this gene have been associated with atherosclerosis, risk of myocardial infarction, and may modify the risk of Alzheimer’s disease. Alternate splicing results in multiple transcript variants. 4973 OLR1 oxidized low density lipoprotein receptor 1 ENSG00000173391 NA
This gene encodes a member of the serine proteinase inhibitor (serpin) superfamily. This member is the principal inhibitor of tissue plasminogen activator (tPA) and urokinase (uPA), and hence is an inhibitor of fibrinolysis. Defects in this gene are the cause of plasminogen activator inhibitor-1 deficiency (PAI-1 deficiency), and high concentrations of the gene product are associated with thrombophilia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 5054 SERPINE1 serpin family E member 1 ENSG00000106366 NA
Spectrins are principle components of a cell’s membrane-cytoskeleton and are composed of two alpha and two beta spectrin subunits. The protein encoded by this gene (SPTBN2), is called spectrin beta non-erythrocytic 2 or beta-III spectrin. It is related to, but distinct from, the beta-II spectrin gene which is also known as spectrin beta non-erythrocytic 1 (SPTBN1). SPTBN2 regulates the glutamate signaling pathway by stabilizing the glutamate transporter EAAT4 at the surface of the plasma membrane. Mutations in this gene cause a form of spinocerebellar ataxia, SCA5, that is characterized by neurodegeneration, progressive locomotor incoordination, dysarthria, and uncoordinated eye movements. 6712 SPTBN2 spectrin beta, non-erythrocytic 2 ENSG00000173898 NA
The delta (HBD) and beta (HBB) genes are normally expressed in the adult: two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin. Two alpha chains plus two delta chains constitute HbA-2, which with HbF comprises the remaining 3% of adult hemoglobin. Five beta-like globin genes are found within a 45 kb cluster on chromosome 11 in the following order: 5’-epsilon–Ggamma–Agamma–delta–beta-3’. Mutations in the delta-globin gene are associated with beta-thalassemia. 3045 HBD hemoglobin subunit delta ENSG00000223609 NA
NA 78995 C17orf53 chromosome 17 open reading frame 53 ENSG00000125319 NA
NA ENSG00000262001 DLGAP1-AS2 DLGAP1 antisense RNA 2 ENSG00000262001 NA
NA ENSG00000267992 CTB-189B5.3 NA ENSG00000267992 NA
This gene is a member of the N-myc downregulated gene family which belongs to the alpha/beta hydrolase superfamily. The protein encoded by this gene is a cytoplasmic protein that is required for cell cycle progression and survival in primary astrocytes and may be involved in the regulation of mitogenic signalling in vascular smooth muscles cells. Alternative splicing results in multiple transcripts encoding different isoforms. 65009 NDRG4 NDRG family member 4 ENSG00000103034 NA
NA 9796 PHYHIP phytanoyl-CoA 2-hydroxylase interacting protein ENSG00000168490 NA
Members of the perilipin family, such as PLIN4, coat intracellular lipid storage droplets (Wolins et al., 2003 [PubMed 12840023]). 729359 PLIN4 perilipin 4 ENSG00000167676 NA
NA 80162 ATHL1 ATH1, acid trehalase-like 1 (yeast) ENSG00000142102 NA
This gene encodes one of several deubiquitylating enzymes. Ubiquitin modification of proteins is needed for their stability and function; to reverse the process, deubiquityling enzymes remove ubiquitin. This protein contains an OTU domain and binds Ubal (ubiquitin aldehyde); an active cysteine protease site is present in the OTU domain. 78990 OTUB2 OTU deubiquitinase, ubiquitin aldehyde binding 2 ENSG00000089723 NA
NA ENSG00000271857 RP1-244F24.1 NA ENSG00000271857 NA
NA ENSG00000255507 RP11-535A19.2 NA ENSG00000255507 NA
The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. 5265 SERPINA1 serpin family A member 1 ENSG00000197249 NA
The protein encoded by this gene is involved in both the synthesis and degradation of fructose-2,6-bisphosphate, a regulatory molecule that controls glycolysis in eukaryotes. The encoded protein has a 6-phosphofructo-2-kinase activity that catalyzes the synthesis of fructose-2,6-bisphosphate, and a fructose-2,6-biphosphatase activity that catalyzes the degradation of fructose-2,6-bisphosphate. This protein regulates fructose-2,6-bisphosphate levels in the heart, while a related enzyme encoded by a different gene regulates fructose-2,6-bisphosphate levels in the liver and muscle. This enzyme functions as a homodimer. Two transcript variants encoding two different isoforms have been found for this gene. 5208 PFKFB2 6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 2 ENSG00000123836 NA
The protein encoded by this gene plays a key role in the acute regulation of steroid hormone synthesis by enhancing the conversion of cholesterol into pregnenolone. This protein permits the cleavage of cholesterol into pregnenolone by mediating the transport of cholesterol from the outer mitochondrial membrane to the inner mitochondrial membrane. Mutations in this gene are a cause of congenital lipoid adrenal hyperplasia (CLAH), also called lipoid CAH. A pseudogene of this gene is located on chromosome 13. 6770 STAR steroidogenic acute regulatory protein ENSG00000147465 NA
Lactic acid and pyruvate transport across plasma membranes is catalyzed by members of the proton-linked monocarboxylate transporter (MCT) family, which has been designated solute carrier family-16. Each MCT appears to have slightly different substrate and inhibitor specificities and transport kinetics, which are related to the metabolic requirements of the tissues in which it is found. The MCTs, which include MCT1 (SLC16A1; MIM 600682) and MCT2 (SLC16A7; MIM 603654), are characterized by 12 predicted transmembrane domains (Price et al., 1998 [PubMed 9425115]). 9123 SLC16A3 solute carrier family 16 member 3 ENSG00000141526 NA
NA 84518 CNFN cornifelin ENSG00000105427 NA
Myosins are actin-based motor proteins that function in the generation of mechanical force in eukaryotic cells. Muscle myosins are heterohexamers composed of 2 myosin heavy chains and 2 pairs of nonidentical myosin light chains. This gene encodes a member of the class II or conventional myosin heavy chains, and functions in skeletal muscle contraction. This gene is found in a cluster of myosin heavy chain genes on chromosome 17. A mutation in this gene results in inclusion body myopathy-3. Multiple alternatively spliced variants, encoding the same protein, have been identified. 4620 MYH2 myosin, heavy chain 2, skeletal muscle, adult ENSG00000125414 NA
NA 84886 C1orf198 chromosome 1 open reading frame 198 ENSG00000119280 NA
The yeast heterotetrameric GINS complex is made up of Sld5 (GINS4; MIM 610611), Psf1, Psf2 (GINS2; MIM 610609), and Psf3 (GINS3; MIM 610610). The formation of the GINS complex is essential for the initiation of DNA replication in yeast and Xenopus egg extracts (Ueno et al., 2005 [PubMed 16287864]). 9837 GINS1 GINS complex subunit 1 ENSG00000101003 NA
NA ENSG00000232093 RP11-307C12.11 NA ENSG00000232093 NA
NA ENSG00000219435 TEX40 testis expressed 40 ENSG00000219435 NA
NA 143903 LAYN layilin ENSG00000204381 NA
This gene encodes a cytoskeletal LIM protein that binds to actin filaments via a domain that is homologous to erythrocyte dematin. LIM domains, found in over 60 proteins, play key roles in the regulation of developmental pathways. LIM domains also function as protein-binding interfaces, mediating specific protein-protein interactions. The protein encoded by this gene could mediate such interactions between actin filaments and cytoplasmic targets. Alternatively spliced transcript variants encoding different isoforms have been identified. 3983 ABLIM1 actin binding LIM protein 1 ENSG00000099204 NA
This gene encodes a protease that removes the N-terminal peroxisomal targeting signal (PTS2) from proteins produced in the cytosol, thereby facilitating their import into the peroxisome. The encoded protein is also capable of removing the C-terminal peroxisomal targeting signal (PTS1) from proteins in the peroxisomal matrix. The full-length protein undergoes self-cleavage to produce shorter, potentially inactive, peptides. Alternative splicing results in multiple transcript variants for this gene. 219743 TYSND1 trypsin domain containing 1 ENSG00000156521 NA
This gene encodes a protein subunit of the GINS heterotetrameric complex, which is essential for the initiation of DNA replication and replisome progression in eukaryotes. Alternatively spliced transcript variants encoding distinct isoforms have been described. 64785 GINS3 GINS complex subunit 3 ENSG00000181938 NA
This gene encodes a protein containing a calponin homology (CH) domain, a PDZ domain, and a LIM domain, and may be involved in protein-protein interactions. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene, however, the full-length nature of some variants is not known. 4008 LMO7 LIM domain 7 ENSG00000136153 NA
NA ENSG00000222112 RN7SKP16 RNA, 7SK small nuclear pseudogene 16 ENSG00000222112 NA
The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. 3039 HBA1 hemoglobin subunit alpha 1 ENSG00000206172 NA
Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain expressed in fast skeletal muscle. Two transcript variants have been identified for this gene. 4632 MYL1 myosin light chain 1 ENSG00000168530 NA
NA 84793 FOXD2-AS1 FOXD2 antisense RNA 1 (head to head) ENSG00000237424 NA
NA ENSG00000267379 CTC-548K16.5 NA ENSG00000267379 NA
The protein encoded by this gene is a component of the SMAD pathway, which regulates cell growth and differentiation through transforming growth factor-beta (TGFB). In the absence of ligand, the encoded protein binds to the promoter region of TGFB-responsive genes and recruits a nuclear repressor complex. TGFB signaling causes SMAD3 to enter the nucleus and degrade this protein, allowing these genes to be activated. Four transcript variants encoding three different isoforms have been found for this gene. 6498 SKIL SKI-like proto-oncogene ENSG00000136603 NA
NA ENSG00000253392 AC006277.2 NA ENSG00000253392 NA
The protein encoded by this gene has a long and a short form, generated by use of alternative translational start codons. The long form is expressed in steroidogenic tissues such as testis, where it converts cholesteryl esters to free cholesterol for steroid hormone production. The short form is expressed in adipose tissue, among others, where it hydrolyzes stored triglycerides to free fatty acids. 3991 LIPE lipase E, hormone sensitive type ENSG00000079435 NA
The protein encoded by this gene belongs to the ‘regulator of G protein signaling’ family. It inhibits signal transduction by increasing the GTPase activity of G protein alpha subunits. It also may play a role in regulating the kinetics of signaling in the phototransduction cascade. 6004 RGS16 regulator of G-protein signaling 16 ENSG00000143333 NA
This gene encodes the cytosolic form of serine hydroxymethyltransferase, a pyridoxal phosphate-containing enzyme that catalyzes the reversible conversion of serine and tetrahydrofolate to glycine and 5,10-methylene tetrahydrofolate. This reaction provides one-carbon units for synthesis of methionine, thymidylate, and purines in the cytoplasm. This gene is located within the Smith-Magenis syndrome region on chromosome 17. A pseudogene of this gene is located on the short arm of chromosome 1. Alternative splicing results in multiple transcript variants. 6470 SHMT1 serine hydroxymethyltransferase 1 ENSG00000176974 NA
This gene encodes a member of the transforming growth factor beta (TGFB) family of cytokines, which are multifunctional peptides that regulate proliferation, differentiation, adhesion, migration, and other functions in many cell types by transducing their signal through combinations of transmembrane type I and type II receptors (TGFBR1 and TGFBR2) and their downstream effectors, the SMAD proteins. Disruption of the TGFB/SMAD pathway has been implicated in a variety of human cancers. The encoded protein is secreted and has suppressive effects of interleukin-2 dependent T-cell growth. Translocation t(1;7)(q41;p21) between this gene and HDAC9 is associated with Peters’ anomaly, a congenital defect of the anterior chamber of the eye. The knockout mice lacking this gene show perinatal mortality and a wide range of developmental, including cardiac, defects. Alternatively spliced transcript variants encoding different isoforms have been identified. 7042 TGFB2 transforming growth factor beta 2 ENSG00000092969 NA
NA NA NA NA ENSG00000256545 TRUE
FAAP24 is a component of the Fanconi anemia (FA) core complex (see MIM 227650), which plays a crucial role in DNA damage response (Ciccia et al., 2007 [PubMed 17289582]). 91442 FAAP24 Fanconi anemia core complex associated protein 24 ENSG00000131944 NA
NA ENSG00000256462 RP11-116G8.5 NA ENSG00000256462 NA
NA 65985 AACS acetoacetyl-CoA synthetase ENSG00000081760 NA
This gene encodes a member of the TGF-beta family of proteins. The encoded protein is secreted and is involved in embryogenesis and cell differentiation. Defects in this gene are a cause of familial arrhythmogenic right ventricular dysplasia 1. 7043 TGFB3 transforming growth factor beta 3 ENSG00000119699 NA
This antimicrobial gene encodes a member of the CXC subfamily of chemokines. The encoded protein is a secreted growth factor that signals through the G-protein coupled receptor, CXC receptor 2. This protein plays a role in inflammation and as a chemoattractant for neutrophils. Aberrant expression of this protein is associated with the growth and progression of certain tumors. A naturally occurring processed form of this protein has increased chemotactic activity. Alternate splicing results in coding and non-coding variants of this gene. A pseudogene of this gene is found on chromosome 4. 2919 CXCL1 C-X-C motif chemokine ligand 1 ENSG00000163739 NA
This gene is one of several cytokine genes clustered on the q-arm of chromosome 17. Chemokines are a superfamily of secreted proteins involved in immunoregulatory and inflammatory processes. The superfamily is divided into four subfamilies based on the arrangement of N-terminal cysteine residues of the mature peptide. This chemokine is a member of the CC subfamily which is characterized by two adjacent cysteine residues. This cytokine displays chemotactic activity for monocytes and basophils but not for neutrophils or eosinophils. It has been implicated in the pathogenesis of diseases characterized by monocytic infiltrates, like psoriasis, rheumatoid arthritis and atherosclerosis. It binds to chemokine receptors CCR2 and CCR4. 6347 CCL2 C-C motif chemokine ligand 2 ENSG00000108691 NA
NA ENSG00000254272 RP11-382J24.2 NA ENSG00000254272 NA
This gene encodes a type I transmembrane protein that is localized to junctional complexes between endothelial and epithelial cells and may have a role in cell-cell adhesion. Expression of this gene in white adipose tissue is implicated in adipocyte maturation and development of obesity. This gene is also essential for normal intestinal development and mutations in the gene are associated with congenital short bowel syndrome. 79827 CLMP CXADR-like membrane protein ENSG00000166250 NA
NA 113146 AHNAK2 AHNAK nucleoprotein 2 ENSG00000185567 NA
This gene is a member of the visinin/recoverin subfamily of neuronal calcium sensor proteins. The encoded protein is strongly expressed in granule cells of the cerebellum where it associates with membranes in a calcium-dependent manner and modulates intracellular signaling pathways of the central nervous system by directly or indirectly regulating the activity of adenylyl cyclase. Alternatively spliced transcript variants have been observed, but their full-length nature has not been determined. 7447 VSNL1 visinin like 1 ENSG00000163032 NA
This gene encodes a member of the desmocollin protein subfamily. Desmocollins, along with desmogleins, are cadherin-like transmembrane glycoproteins that are major components of the desmosome. Desmosomes are cell-cell junctions that help resist shearing forces and are found in high concentrations in cells subject to mechanical stress. This gene is found in a cluster with other desmocollin family members on chromosome 18. Mutations in this gene are associated with arrhythmogenic right ventricular dysplasia-11, and reduced protein expression has been described in several types of cancer. Alternative splicing results in multiple transcript variants. 1824 DSC2 desmocollin 2 ENSG00000134755 NA
This gene is a member of the immunoglobulin superfamily. The encoded poly-Ig receptor binds polymeric immunoglobulin molecules at the basolateral surface of epithelial cells; the complex is then transported across the cell to be secreted at the apical surface. A significant association was found between immunoglobulin A nephropathy and several SNPs in this gene. 5284 PIGR polymeric immunoglobulin receptor ENSG00000162896 NA
This gene encodes the receptor for urokinase plasminogen activator and, given its role in localizing and promoting plasmin formation, likely influences many normal and pathological processes related to cell-surface plasminogen activation and localized degradation of the extracellular matrix. It binds both the proprotein and mature forms of urokinase plasminogen activator and permits the activation of the receptor-bound pro-enzyme by plasmin. The protein lacks transmembrane or cytoplasmic domains and may be anchored to the plasma membrane by a glycosyl-phosphatidylinositol (GPI) moiety following cleavage of the nascent polypeptide near its carboxy-terminus. However, a soluble protein is also produced in some cell types. Alternative splicing results in multiple transcript variants encoding different isoforms. The proprotein experiences several post-translational cleavage reactions that have not yet been fully defined. 5329 PLAUR plasminogen activator, urokinase receptor ENSG00000011422 NA
Defensins are a family of antimicrobial and cytotoxic peptides thought to be involved in host defense. They are abundant in the granules of neutrophils and also found in the epithelia of mucosal surfaces such as those of the intestine, respiratory tract, urinary tract, and vagina. Members of the defensin family are highly similar in protein sequence and distinguished by a conserved cysteine motif. The protein encoded by this gene, defensin, alpha 3, is found in the microbicidal granules of neutrophils and likely plays a role in phagocyte-mediated host defense. Several alpha defensin genes are clustered on chromosome 8. This gene differs from defensin, alpha 1 by only one amino acid. This gene and the gene encoding defensin, alpha 1 are both subject to copy number variation. 1668 DEFA3 defensin alpha 3 ENSG00000239839 NA
Defensins are a family of antimicrobial and cytotoxic peptides thought to be involved in host defense. They are abundant in the granules of neutrophils and also found in the epithelia of mucosal surfaces such as those of the intestine, respiratory tract, urinary tract, and vagina. Members of the defensin family are highly similar in protein sequence and distinguished by a conserved cysteine motif. The protein encoded by this gene, defensin, alpha 1, is found in the microbicidal granules of neutrophils and likely plays a role in phagocyte-mediated host defense. Several alpha defensin genes are clustered on chromosome 8. This gene differs from defensin, alpha 3 by only one amino acid. This gene and the gene encoding defensin, alpha 3 are both subject to copy number variation. 1667 DEFA1 defensin alpha 1 ENSG00000239839 NA
Defensins are a family of antimicrobial and cytotoxic peptides thought to be involved in host defense. They are abundant in the granules of neutrophils and also found in the epithelia of mucosal surfaces such as those of the intestine, respiratory tract, urinary tract, and vagina. Members of the defensin family are highly similar in protein sequence and distinguished by a conserved cysteine motif. The protein encoded by this gene, defensin, alpha 1, is found in the microbicidal granules of neutrophils and likely plays a role in phagocyte-mediated host defense. Several alpha defensin genes are clustered on chromosome 8. This gene differs from defensin, alpha 3 by only one amino acid. This gene and the gene encoding defensin, alpha 3 are both subject to copy number variation. Two transcript variants encoding different isoforms have been found for this gene. 728358 DEFA1B defensin alpha 1B ENSG00000239839 NA
NA 80336 PABPC1L poly(A) binding protein cytoplasmic 1 like ENSG00000101104 NA
Troponin proteins associate with tropomyosin and regulate the calcium sensitivity of the myofibril contractile apparatus of striated muscles. Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. The TnI-fast and TnI-slow genes are expressed in fast-twitch and slow-twitch skeletal muscle fibers, respectively, while the TnI-cardiac gene is expressed exclusively in cardiac muscle tissue. This gene encodes the Troponin-I-skeletal-slow-twitch protein. This gene is expressed in cardiac and skeletal muscle during early development but is restricted to slow-twitch skeletal muscle fibers in adults. The encoded protein prevents muscle contraction by inhibiting calcium-mediated conformational changes in actin-myosin complexes. 7135 TNNI1 troponin I1, slow skeletal type ENSG00000159173 NA
NA 65989 DLK2 delta like non-canonical Notch ligand 2 ENSG00000171462 NA
NA ENSG00000230530 LIMD1-AS1 LIMD1 antisense RNA 1 ENSG00000230530 NA
This gene encodes a member of the epidermal growth factor (EGF) receptor family of receptor tyrosine kinases. This protein has no ligand binding domain of its own and therefore cannot bind growth factors. However, it does bind tightly to other ligand-bound EGF receptor family members to form a heterodimer, stabilizing ligand binding and enhancing kinase-mediated activation of downstream signalling pathways, such as those involving mitogen-activated protein kinase and phosphatidylinositol-3 kinase. Allelic variations at amino acid positions 654 and 655 of isoform a (positions 624 and 625 of isoform b) have been reported, with the most common allele, Ile654/Ile655, shown here. Amplification and/or overexpression of this gene has been reported in numerous cancers, including breast and ovarian tumors. Alternative splicing results in several additional transcript variants, some encoding different isoforms and others that have not been fully characterized. 2064 ERBB2 erb-b2 receptor tyrosine kinase 2 ENSG00000141736 NA
This gene encodes a member of the EGF-TM7 subfamily of adhesion G protein-coupled receptors, which mediate cell-cell interactions. These proteins are cleaved by self-catalytic proteolysis into a large extracellular subunit and seven-span transmembrane subunit, which associate at the cell surface as a receptor complex. The encoded protein may play a role in cell adhesion as well as leukocyte recruitment, activation and migration, and contains multiple extracellular EGF-like repeats which mediate binding to chondroitin sulfate and the cell surface complement regulatory protein CD55. Expression of this gene may play a role in the progression of several types of cancer. Alternatively spliced transcript variants encoding multiple isoforms with 3 to 5 EGF-like repeats have been observed for this gene. This gene is found in a cluster with other EGF-TM7 genes on the short arm of chromosome 19. 976 ADGRE5 adhesion G protein-coupled receptor E5 ENSG00000123146 NA
NA NA NA NA ENSG00000256005 TRUE
This gene encodes one of the six subunits of type IV collagen, the major structural component of basement membranes. This particular collagen IV subunit, however, is only found in a subset of basement membranes. Like the other members of the type IV collagen gene family, this gene is organized in a head-to-head conformation with another type IV collagen gene so that each gene pair shares a common promoter. Mutations in this gene are associated with type II autosomal recessive Alport syndrome (hereditary glomerulonephropathy) and with familial benign hematuria (thin basement membrane disease). Two transcripts, differing only in their transcription start sites, have been identified for this gene and, as is common for collagen genes, multiple polyadenylation sites are found in the 3’ UTR. 1286 COL4A4 collagen type IV alpha 4 chain ENSG00000081052 NA
NA 57467 HHATL hedgehog acyltransferase-like ENSG00000010282 NA
This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. 5967 REG1A regenerating family member 1 alpha ENSG00000115386 NA
NA 55659 ZNF416 zinc finger protein 416 ENSG00000083817 NA
Neurogranin (NRGN) is the human homolog of the neuron-specific rat RC3/neurogranin gene. This gene encodes a postsynaptic protein kinase substrate that binds calmodulin in the absence of calcium. The NRGN gene contains four exons and three introns. The exons 1 and 2 encode the protein and exons 3 and 4 contain untranslated sequences. It is suggested that the NRGN is a direct target for thyroid hormone in human brain, and that control of expression of this gene could underlay many of the consequences of hypothyroidism on mental states during development as well as in adult subjects. 4900 NRGN neurogranin ENSG00000154146 NA
This gene encodes a member of the semicarbazide-sensitive amine oxidase family. Copper amine oxidases catalyze the oxidative conversion of amines to aldehydes in the presence of copper and quinone cofactor. The encoded protein is localized to the cell surface, has adhesive properties as well as monoamine oxidase activity, and may be involved in leukocyte trafficking. Alterations in levels of the encoded protein may be associated with many diseases, including diabetes mellitus. A pseudogene of this gene has been described and is located approximately 9-kb downstream on the same chromosome. Alternative splicing results in multiple transcript variants. 8639 AOC3 amine oxidase, copper containing 3 ENSG00000131471 NA
Acetyl-CoA carboxylase (ACC) is a complex multifunctional enzyme system. ACC is a biotin-containing enzyme which catalyzes the carboxylation of acetyl-CoA to malonyl-CoA, the rate-limiting step in fatty acid synthesis. ACC-beta is thought to control fatty acid oxidation by means of the ability of malonyl-CoA to inhibit carnitine-palmitoyl-CoA transferase I, the rate-limiting step in fatty acid uptake and oxidation by mitochondria. ACC-beta may be involved in the regulation of fatty acid oxidation, rather than fatty acid biosynthesis. There is evidence for the presence of two ACC-beta isoforms. 32 ACACB acetyl-CoA carboxylase beta ENSG00000076555 NA
This gene encodes a member of the cytokine family. The protein contains a tyrosine sulfation site, 3 potential N-myristoylation sites, multiple putative phosphorylation sites, and an RGD cell-attachment sequence. Expression of this protein is increased after the activation of T-cells by mitogens or the activation of NK cells by IL-2. This protein induces the production of TNFalpha from macrophage cells. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. 9235 IL32 interleukin 32 ENSG00000008517 NA
Guanine nucleotide binding proteins are heterotrimeric signal-transducing molecules consisting of alpha, beta, and gamma subunits. The alpha subunit binds guanine nucleotide, can hydrolyze GTP, and can interact with other proteins. The protein encoded by this gene represents the alpha subunit of an inhibitory complex. The encoded protein is part of a complex that responds to beta-adrenergic signals by inhibiting adenylate cyclase. Two transcript variants encoding different isoforms have been found for this gene. 2770 GNAI1 G protein subunit alpha i1 ENSG00000127955 NA
NA ENSG00000177337 DLGAP1-AS1 DLGAP1 antisense RNA 1 ENSG00000177337 NA
Mutations in the Schizosaccharomyces pombe Rae1 and Saccharomyces cerevisiae Gle2 genes have been shown to result in accumulation of poly(A)-containing mRNA in the nucleus, suggesting that the encoded proteins are involved in RNA export. The protein encoded by this gene is a homolog of yeast Rae1. It contains four WD40 motifs, and has been shown to localize to distinct foci in the nucleoplasm, to the nuclear rim, and to meshwork-like structures throughout the cytoplasm. This gene is thought to be involved in nucleocytoplasmic transport, and in directly or indirectly attaching cytoplasmic mRNPs to the cytoskeleton. Alternatively spliced transcript variants encoding the same protein have been found for this gene. 8480 RAE1 ribonucleic acid export 1 ENSG00000101146 NA
NA ENSG00000267249 RP11-973H7.3 NA ENSG00000267249 NA
NA ENSG00000269463 RP11-727F15.13 NA ENSG00000269463 NA
Histones are basic nuclear proteins that are responsible for the nucleosome structure of the chromosomal fiber in eukaryotes. Two molecules of each of the four core histones (H2A, H2B, H3, and H4) form an octamer, around which approximately 146 bp of DNA is wrapped in repeating units, called nucleosomes. The linker histone, H1, interacts with linker DNA between nucleosomes and functions in the compaction of chromatin into higher order structures. This gene is intronless and encodes a replication-dependent histone that is a member of the histone H4 family. Transcripts from this gene lack polyA tails but instead contain a palindromic termination element. This gene is found in the large histone gene cluster on chromosome 6. 8365 HIST1H4H histone cluster 1, H4h ENSG00000158406 NA
This gene encodes a component of vacuolar ATPase (V-ATPase), a multisubunit enzyme that mediates acidification of eukaryotic intracellular organelles. V-ATPase dependent organelle acidification is necessary for such intracellular processes as protein sorting, zymogen activation, receptor-mediated endocytosis, and synaptic vesicle proton gradient generation. V-ATPase is composed of a cytosolic V1 domain and a transmembrane V0 domain. The V1 domain consists of three A,three B, and two G subunits, as well as a C, D, E, F, and H subunit. The V1 domain contains the ATP catalytic site. This gene encodes alternate transcriptional splice variants, encoding different V1 domain C subunit isoforms. 245973 ATP6V1C2 ATPase H+ transporting V1 subunit C2 ENSG00000143882 NA
This gene encodes a member of the type 3 G protein-coupling receptor family, characterized by the signature 7-transmembrane domain motif. The encoded protein may be involved in interaction between retinoid acid and G protein signalling pathways. Retinoic acid plays a critical role in development, cellular growth, and differentiation. This gene may play a role in embryonic development and epithelial cell differentiation. 9052 GPRC5A G protein-coupled receptor class C group 5 member A ENSG00000013588 NA
The protein encoded by this gene belongs to a family of proteins thought to play a role in the exocytosis of synaptic vesicles. Vesicle exocytosis releases vesicular contents and is important to various cellular functions. For instance, the secretion of transmitters from neurons plays an important role in synaptic transmission. After exocytosis, the membrane and proteins from the vesicle are retrieved from the plasma membrane through the process of endocytosis. Mutations in this gene have been identified as one cause of fever-associated epilepsy syndromes. A possible link between this gene and Parkinson’s disease has also been suggested. 112755 STX1B syntaxin 1B ENSG00000099365 NA
NA ENSG00000262251 RP11-199F11.2 NA ENSG00000262251 NA
This gene encodes the pulmonary-associated surfactant protein C (SPC), an extremely hydrophobic surfactant protein essential for lung function and homeostasis after birth. Pulmonary surfactant is a surface-active lipoprotein complex composed of 90% lipids and 10% proteins which include plasma proteins and apolipoproteins SPA, SPB, SPC and SPD. The surfactant is secreted by the alveolar cells of the lung and maintains the stability of pulmonary tissue by reducing the surface tension of fluids that coat the lung. Multiple mutations in this gene have been identified, which cause pulmonary surfactant metabolism dysfunction type 2, also called pulmonary alveolar proteinosis due to surfactant protein C deficiency, and are associated with interstitial lung disease in older infants, children, and adults. Alternatively spliced transcript variants encoding different protein isoforms have been identified. 6440 SFTPC surfactant protein C ENSG00000168484 NA
NA 79000 AUNIP aurora kinase A and ninein interacting protein ENSG00000127423 NA
NA 253635 GPATCH11 G-patch domain containing 11 ENSG00000152133 NA
NA 27106 ARRDC2 arrestin domain containing 2 ENSG00000105643 NA
NA 124976 SPNS2 sphingolipid transporter 2 ENSG00000183018 NA
The gamma globin genes (HBG1 and HBG2) are normally expressed in the fetal liver, spleen and bone marrow. Two gamma chains together with two alpha chains constitute fetal hemoglobin (HbF) which is normally replaced by adult hemoglobin (HbA) at birth. In some beta-thalassemias and related conditions, gamma chain production continues into adulthood. The two types of gamma chains differ at residue 136 where glycine is found in the G-gamma product (HBG2) and alanine is found in the A-gamma product (HBG1). The former is predominant at birth. The order of the genes in the beta-globin cluster is: 5’- epsilon – gamma-G – gamma-A – delta – beta–3’. 3048 HBG2 hemoglobin subunit gamma 2 ENSG00000196565 NA
The protein encoded by this gene is an adenosine receptor that belongs to the G-protein coupled receptor 1 family. There are 3 types of adenosine receptors, each with a specific pattern of ligand binding and tissue distribution, and together they regulate a diverse set of physiologic functions. The type A1 receptors inhibit adenylyl cyclase, and play a role in the fertilization process. Animal studies also suggest a role for A1 receptors in kidney function and ethanol intoxication. Transcript variants with alternative splicing in the 5’ UTR have been found for this gene. 134 ADORA1 adenosine A1 receptor ENSG00000163485 NA
NA 441478 NRARP NOTCH-regulated ankyrin repeat protein ENSG00000198435 NA
NA 151246 SGO2 shugoshin 2 ENSG00000163535 NA
This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the mitochondrial inner membrane and is involved in the conversion of progesterone to cortisol in the adrenal cortex. Mutations in this gene cause congenital adrenal hyperplasia due to 11-beta-hydroxylase deficiency. Transcript variants encoding different isoforms have been noted for this gene. 1584 CYP11B1 cytochrome P450 family 11 subfamily B member 1 ENSG00000160882 NA
Histones are basic nuclear proteins that are responsible for the nucleosome structure of the chromosomal fiber in eukaryotes. This structure consists of approximately 146 bp of DNA wrapped around a nucleosome, an octamer composed of pairs of each of the four core histones (H2A, H2B, H3, and H4). The chromatin fiber is further compacted through the interaction of a linker histone, H1, with the DNA between the nucleosomes to form higher order chromatin structures. This gene encodes a replication-dependent histone that is a member of the histone H2B family and is found in a histone cluster on chromosome 1. 440689 HIST2H2BF histone cluster 2, H2bf ENSG00000203814 NA
NA NA NA NA ENSG00000156750 TRUE
This gene encodes a member of the tyrosine phosphatase family of proteins and exhibits dual specificity by dephosphorylating tyrosine as well as serine and threonine residues. This gene has been described as both a tumor suppressor and an oncogene depending on the cellular context. This protein may regulate neuronal proliferation and has been implicated in the progression of glioblastoma through its ability to dephosphorylate the p53 tumor suppressor. Alternative splicing results in multiple transcript variants. 78986 DUSP26 dual specificity phosphatase 26 (putative) ENSG00000133878 NA
The Rab subfamily of small GTPases plays an important role in the regulation of membrane trafficking. RAB17 is an epithelial cell-specific GTPase (Lutcke et al., 1993 [PubMed 8486736]). 64284 RAB17 RAB17, member RAS oncogene family ENSG00000124839 NA
This gene encodes a member of the steroid-thyroid hormone-retinoid receptor superfamily. The encoded protein may act as a transcriptional activator. The protein can efficiently bind the NGFI-B Response Element (NBRE). Three different versions of extraskeletal myxoid chondrosarcomas (EMCs) are the result of reciprocal translocations between this gene and other genes. The translocation breakpoints are associated with Nuclear Receptor Subfamily 4, Group A, Member 3 (on chromosome 9) and either Ewing Sarcome Breakpoint Region 1 (on chromosome 22), RNA Polymerase II, TATA Box-Binding Protein-Associated Factor, 68-KD (on chromosome 17), or Transcription factor 12 (on chromosome 15). Multiple transcript variants encoding different isoforms have been found for this gene. 8013 NR4A3 nuclear receptor subfamily 4 group A member 3 ENSG00000119508 NA
The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5’- zeta - pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3’. The alpha-2 (HBA2) and alpha-1 (HBA1) coding sequences are identical. These genes differ slightly over the 5’ untranslated regions and the introns, but they differ significantly over the 3’ untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normal adult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result from deletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemias have also been reported. 3040 HBA2 hemoglobin subunit alpha 2 ENSG00000188536 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",15,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 16 Annotations

out <- mygene::queryMany(gene_list[16,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol query name X_id summary notfound
IGLC3 ENSG00000211679 immunoglobulin lambda constant 3 (Kern-Oz+ marker) ENSG00000211679 NA NA
IGHM ENSG00000211899 immunoglobulin heavy constant mu ENSG00000211899 NA NA
IGLC1 ENSG00000211675 immunoglobulin lambda constant 1 (Mcg marker) ENSG00000211675 NA NA
IGLL5 ENSG00000254709 immunoglobulin lambda like polypeptide 5 100423062 This gene encodes one of the immunoglobulin lambda-like polypeptides. It is located within the immunoglobulin lambda locus but it does not require somatic rearrangement for expression. The first exon of this gene is unrelated to immunoglobulin variable genes; the second and third exons are the immunoglobulin lambda joining 1 and the immunoglobulin lambda constant 1 gene segments. Alternative splicing results in multiple transcript variants. NA
IGLC2 ENSG00000211677 immunoglobulin lambda constant 2 (Kern-Oz- marker) ENSG00000211677 NA NA
IGHA1 ENSG00000211895 immunoglobulin heavy constant alpha 1 ENSG00000211895 NA NA
IGHA2 ENSG00000211890 immunoglobulin heavy constant alpha 2 (A2m marker) ENSG00000211890 NA NA
SIGLEC10 ENSG00000142512 sialic acid binding Ig like lectin 10 89790 SIGLECs are members of the immunoglobulin superfamily that are expressed on the cell surface. Most SIGLECs have 1 or more cytoplasmic immune receptor tyrosine-based inhibitory motifs, or ITIMs. SIGLECs are typically expressed on cells of the innate immune system, with the exception of the B-cell expressed SIGLEC6 (MIM 604405). NA
CTD-2616J11.3 ENSG00000254760 NA ENSG00000254760 NA NA
CTD-2616J11.2 ENSG00000255441 NA ENSG00000255441 NA NA
JCHAIN ENSG00000132465 joining chain of multimeric IgA and IgM 3512 NA NA
CTSS ENSG00000163131 cathepsin S 1520 The protein encoded by this gene, a member of the peptidase C1 family, is a lysosomal cysteine proteinase that may participate in the degradation of antigenic proteins to peptides for presentation on MHC class II molecules. The encoded protein can function as an elastase over a broad pH range in alveolar macrophages. Alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. NA
SERPINA1 ENSG00000197249 serpin family A member 1 5265 The protein encoded by this gene is secreted and is a serine protease inhibitor whose targets include elastase, plasmin, thrombin, trypsin, chymotrypsin, and plasminogen activator. Defects in this gene can cause emphysema or liver disease. Several transcript variants encoding the same protein have been found for this gene. NA
APOBR ENSG00000184730 apolipoprotein B receptor 55911 Apolipoprotein B48 receptor is a macrophage receptor that binds to the apolipoprotein B48 of dietary triglyceride (TG)-rich lipoproteins. This receptor may provide essential lipids, lipid-soluble vitamins and other nutrients to reticuloendothelial cells. If overwhelmed with elevated plasma triglyceride, the apolipoprotein B48 receptor may contribute to foam cell formation, endothelial dysfunction, and atherothrombogenesis. NA
CYP2S1 ENSG00000167600 cytochrome P450 family 2 subfamily S member 1 29785 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. In rodents, the homologous protein has been shown to metabolize certain carcinogens; however, the specific function of the human protein has not been determined. NA
CTB-171A8.1 ENSG00000266903 NA ENSG00000266903 NA NA
RP11-731F5.2 ENSG00000253364 NA ENSG00000253364 NA NA
IGHG3 ENSG00000211897 immunoglobulin heavy constant gamma 3 (G3m marker) ENSG00000211897 NA NA
LGALS4 ENSG00000171747 galectin 4 3960 The galectins are a family of beta-galactoside-binding proteins implicated in modulating cell-cell and cell-matrix interactions. The expression of this gene is restricted to small intestine, colon, and rectum, and it is underexpressed in colorectal cancer. NA
TBXAS1 ENSG00000059377 thromboxane A synthase 1 6916 This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. However, this protein is considered a member of the cytochrome P450 superfamily on the basis of sequence similarity rather than functional similarity. This endoplasmic reticulum membrane protein catalyzes the conversion of prostglandin H2 to thromboxane A2, a potent vasoconstrictor and inducer of platelet aggregation. The enzyme plays a role in several pathophysiological processes including hemostasis, cardiovascular disease, and stroke. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
IGHG1 ENSG00000211896 immunoglobulin heavy constant gamma 1 (G1m marker) ENSG00000211896 NA NA
PLAC8 ENSG00000145287 placenta specific 8 51316 NA NA
IGHG2 ENSG00000211893 immunoglobulin heavy constant gamma 2 (G2m marker) ENSG00000211893 NA NA
KRT1 ENSG00000167768 keratin 1 3848 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the spinous and granular layers of the epidermis with family member KRT10 and mutations in these genes have been associated with bullous congenital ichthyosiform erythroderma. The type II cytokeratins are clustered in a region of chromosome 12q12-q13. NA
CLDN7 ENSG00000181885 claudin 7 1366 This gene encodes a member of the claudin family. Claudins are integral membrane proteins and components of tight junction strands. Tight junction strands serve as a physical barrier to prevent solutes and water from passing freely through the paracellular space between epithelial or endothelial cell sheets, and also play critical roles in maintaining cell polarity and signal transductions. Differential expression of this gene has been observed in different types of malignancies, including breast cancer, ovarian cancer, hepatocellular carcinomas, urinary tumors, prostate cancer, lung cancer, head and neck cancers, thyroid carcinomas, etc.. Alternatively spliced transcript variants encoding different isoforms have been found. NA
EPCAM ENSG00000119888 epithelial cell adhesion molecule 4072 This gene encodes a carcinoma-associated antigen and is a member of a family that includes at least two type I membrane proteins. This antigen is expressed on most normal epithelial cells and gastrointestinal carcinomas and functions as a homotypic calcium-independent cell adhesion molecule. The antigen is being used as a target for immunotherapy treatment of human carcinomas. Mutations in this gene result in congenital tufting enteropathy. NA
SULT1A2 ENSG00000197165 sulfotransferase family 1A member 2 6799 Sulfotransferase enzymes catalyze the sulfate conjugation of many hormones, neurotransmitters, drugs, and xenobiotic compounds. These cytosolic enzymes are different in their tissue distributions and substrate specificities. The gene structure (number and length of exons) is similar among family members. This gene encodes one of two phenol sulfotransferases with thermostable enzyme activity. Two alternatively spliced variants that encode the same protein have been described. NA
P2RX1 ENSG00000108405 purinergic receptor P2X 1 5023 The protein encoded by this gene belongs to the P2X family of G-protein-coupled receptors. These proteins can form homo-and heterotimers and function as ATP-gated ion channels and mediate rapid and selective permeability to cations. This protein is primarily localized to smooth muscle where binds ATP and mediates synaptic transmission between neurons and from neurons to smooth muscle and may being responsible for sympathetic vasoconstriction in small arteries, arterioles and vas deferens. Mouse studies suggest that this receptor is essential for normal male reproductive function. This protein may also be involved in promoting apoptosis. NA
CD79A ENSG00000105369 CD79a molecule 973 The B lymphocyte antigen receptor is a multimeric complex that includes the antigen-specific component, surface immunoglobulin (Ig). Surface Ig non-covalently associates with two other proteins, Ig-alpha and Ig-beta, which are necessary for expression and function of the B-cell antigen receptor. This gene encodes the Ig-alpha protein of the B-cell antigen component. Alternatively spliced transcript variants encoding different isoforms have been described. NA
SPINK1 ENSG00000164266 serine peptidase inhibitor, Kazal type 1 6690 The protein encoded by this gene is a trypsin inhibitor, which is secreted from pancreatic acinar cells into pancreatic juice. It is thought to function in the prevention of trypsin-catalyzed premature activation of zymogens within the pancreas and the pancreatic duct. Mutations in this gene are associated with hereditary pancreatitis and tropical calcific pancreatitis. NA
SFN ENSG00000175793 stratifin 2810 NA NA
IL1B ENSG00000125538 interleukin 1 beta 3553 The protein encoded by this gene is a member of the interleukin 1 cytokine family. This cytokine is produced by activated macrophages as a proprotein, which is proteolytically processed to its active form by caspase 1 (CASP1/ICE). This cytokine is an important mediator of the inflammatory response, and is involved in a variety of cellular activities, including cell proliferation, differentiation, and apoptosis. The induction of cyclooxygenase-2 (PTGS2/COX2) by this cytokine in the central nervous system (CNS) is found to contribute to inflammatory pain hypersensitivity. This gene and eight other interleukin 1 family genes form a cytokine gene cluster on chromosome 2. NA
LYZ ENSG00000090382 lysozyme 4069 This gene encodes human lysozyme, whose natural substrate is the bacterial cell wall peptidoglycan (cleaving the beta[1-4]glycosidic linkages between N-acetylmuramic acid and N-acetylglucosamine). Lysozyme is one of the antimicrobial agents found in human milk, and is also present in spleen, lung, kidney, white blood cells, plasma, saliva, and tears. The protein has antibacterial activity against a number of bacterial species. Missense mutations in this gene have been identified in heritable renal amyloidosis. NA
STXBP2 ENSG00000076944 syntaxin binding protein 2 6813 This gene encodes a member of the STXBP/unc-18/SEC1 family. The encoded protein is involved in intracellular trafficking, control of SNARE (soluble NSF attachment protein receptor) complex assembly, and the release of cytotoxic granules by natural killer cells. Mutations in this gene are associated with familial hemophagocytic lymphohistiocytosis. Alternatively spliced transcript variants encoding different isoforms have been noted for this gene. NA
MATK ENSG00000007264 megakaryocyte-associated tyrosine kinase 4145 The protein encoded by this gene has amino acid sequence similarity to Csk tyrosine kinase and has the structural features of the CSK subfamily: SRC homology SH2 and SH3 domains, a catalytic domain, a unique N terminus, lack of myristylation signals, lack of a negative regulatory phosphorylation site, and lack of an autophosphorylation site. This protein is thought to play a significant role in the signal transduction of hematopoietic cells. It is able to phosphorylate and inactivate Src family kinases, and may play an inhibitory role in the control of T-cell proliferation. This protein might be involved in signaling in some cases of breast cancer. Three alternatively spliced transcript variants that encode different isoforms have been described for this gene. NA
LTK ENSG00000062524 leukocyte receptor tyrosine kinase 4058 The protein encoded by this gene is a member of the ros/insulin receptor family of tyrosine kinases. Tyrosine-specific phosphorylation of proteins is a key to the control of diverse pathways leading to cell growth and differentiation. Multiple transcript variants encoding different isoforms have been found for this gene. NA
TSPAN1 ENSG00000117472 tetraspanin 1 10103 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. NA
OASL ENSG00000135114 2’-5’-oligoadenylate synthetase like 8638 NA NA
IGHG4 ENSG00000211892 immunoglobulin heavy constant gamma 4 (G4m marker) ENSG00000211892 NA NA
ITGAX ENSG00000140678 integrin subunit alpha X 3687 This gene encodes the integrin alpha X chain protein. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain. This protein combines with the beta 2 chain (ITGB2) to form a leukocyte-specific integrin referred to as inactivated-C3b (iC3b) receptor 4 (CR4). The alpha X beta 2 complex seems to overlap the properties of the alpha M beta 2 integrin in the adherence of neutrophils and monocytes to stimulated endothelium cells, and in the phagocytosis of complement coated particles. Two transcript variants encoding different isoforms have been found for this gene. NA
ARHGAP45 ENSG00000180448 Rho GTPase activating protein 45 23526 NA NA
CMPK2 ENSG00000134326 cytidine/uridine monophosphate kinase 2 129607 This gene encodes one of the enzymes in the nucleotide synthesis salvage pathway that may participate in terminal differentiation of monocytic cells. Multiple transcript variants encoding different isoforms have been found for this gene. NA
ST14 ENSG00000149418 suppression of tumorigenicity 14 6768 The protein encoded by this gene is an epithelial-derived, integral membrane serine protease. This protease forms a complex with the Kunitz-type serine protease inhibitor, HAI-1, and is found to be activated by sphingosine 1-phosphate. This protease has been shown to cleave and activate hepatocyte growth factor/scattering factor, and urokinase plasminogen activator, which suggest the function of this protease as an epithelial membrane activator for other proteases and latent growth factors. The expression of this protease has been associated with breast, colon, prostate, and ovarian tumors, which implicates its role in cancer invasion, and metastasis. NA
TPD52 ENSG00000076554 tumor protein D52 7163 NA NA
NA ENSG00000161570 NA NA NA TRUE
RPS6KA1 ENSG00000117676 ribosomal protein S6 kinase A1 6195 This gene encodes a member of the RSK (ribosomal S6 kinase) family of serine/threonine kinases. This kinase contains 2 nonidentical kinase catalytic domains and phosphorylates various substrates, including members of the mitogen-activated kinase (MAPK) signalling pathway. The activity of this protein has been implicated in controlling cell growth and differentiation. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. NA
CARD11 ENSG00000198286 caspase recruitment domain family member 11 84433 The protein encoded by this gene belongs to the membrane-associated guanylate kinase (MAGUK) family, a class of proteins that functions as molecular scaffolds for the assembly of multiprotein complexes at specialized regions of the plasma membrane. This protein is also a member of the CARD protein family, which is defined by carrying a characteristic caspase-associated recruitment domain (CARD). This protein has a domain structure similar to that of CARD14 protein. The CARD domains of both proteins have been shown to specifically interact with BCL10, a protein known to function as a positive regulator of cell apoptosis and NF-kappaB activation. When expressed in cells, this protein activated NF-kappaB and induced the phosphorylation of BCL10. NA
PDIA2 ENSG00000185615 protein disulfide isomerase family A member 2 64714 Protein disulfide isomerases (EC 5.3.4.1), such as PDIP, are endoplasmic reticulum (ER) resident proteins that catalyze protein folding and thiol-disulfide interchange reactions (Desilva et al., 1996 [PubMed 8561901]). NA
SPINT1 ENSG00000166145 serine peptidase inhibitor, Kunitz type 1 6692 The protein encoded by this gene is a member of the Kunitz family of serine protease inhibitors. The protein is a potent inhibitor specific for HGF activator and is thought to be involved in the regulation of the proteolytic activation of HGF in injured tissues. Alternative splicing results in multiple variants encoding different isoforms. NA
CEACAM1 ENSG00000079385 carcinoembryonic antigen related cell adhesion molecule 1 634 This gene encodes a member of the carcinoembryonic antigen (CEA) gene family, which belongs to the immunoglobulin superfamily. Two subgroups of the CEA family, the CEA cell adhesion molecules and the pregnancy-specific glycoproteins, are located within a 1.2 Mb cluster on the long arm of chromosome 19. Eleven pseudogenes of the CEA cell adhesion molecule subgroup are also found in the cluster. The encoded protein was originally described in bile ducts of liver as biliary glycoprotein. Subsequently, it was found to be a cell-cell adhesion molecule detected on leukocytes, epithelia, and endothelia. The encoded protein mediates cell adhesion via homophilic as well as heterophilic binding to other proteins of the subgroup. Multiple cellular activities have been attributed to the encoded protein, including roles in the differentiation and arrangement of tissue three-dimensional structure, angiogenesis, apoptosis, tumor suppression, metastasis, and the modulation of innate and adaptive immune responses. Multiple transcript variants encoding different isoforms have been reported, but the full-length nature of all variants has not been defined. NA
RBM47 ENSG00000163694 RNA binding motif protein 47 54502 NA NA
GALE ENSG00000117308 UDP-galactose-4-epimerase 2582 This gene encodes UDP-galactose-4-epimerase which catalyzes two distinct but analogous reactions: the epimerization of UDP-glucose to UDP-galactose, and the epimerization of UDP-N-acetylglucosamine to UDP-N-acetylgalactosamine. The bifunctional nature of the enzyme has the important metabolic consequence that mutant cells (or individuals) are dependent not only on exogenous galactose, but also on exogenous N-acetylgalactosamine as a necessary precursor for the synthesis of glycoproteins and glycolipids. Mutations in this gene result in epimerase-deficiency galactosemia, also referred to as galactosemia type 3, a disease characterized by liver damage, early-onset cataracts, deafness and mental retardation, with symptoms ranging from mild (‘peripheral’ form) to severe (‘generalized’ form). Multiple alternatively spliced transcripts encoding the same protein have been identified. NA
RP11-703H8.7 ENSG00000255118 NA ENSG00000255118 NA NA
KRT10 ENSG00000186395 keratin 10 3858 This gene encodes a member of the type I (acidic) cytokeratin family, which belongs to the superfamily of intermediate filament (IF) proteins. Keratins are heteropolymeric structural proteins which form the intermediate filament. These filaments, along with actin microfilaments and microtubules, compose the cytoskeleton of epithelial cells. Mutations in this gene are associated with epidermolytic hyperkeratosis. This gene is located within a cluster of keratin family members on chromosome 17q21. NA
TSPAN13 ENSG00000106537 tetraspanin 13 27075 The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. NA
TOX ENSG00000198846 thymocyte selection associated high mobility group box 9760 The protein encoded by this gene contains a HMG box DNA binding domain. HMG boxes are found in many eukaryotic proteins involved in chromatin assembly, transcription and replication. This protein may function to regulate T-cell development. NA
IL18RAP ENSG00000115607 interleukin 18 receptor accessory protein 8807 The protein encoded by this gene is an accessory subunit of the heterodimeric receptor for interleukin 18 (IL18), a proinflammatory cytokine involved in inducing cell-mediated immunity. This protein enhances the IL18-binding activity of the IL18 receptor and plays a role in signaling by IL18. Mutations in this gene are associated with Crohn’s disease and inflammatory bowel disease, and susceptibility to celiac disease and leprosy. Alternatively spliced transcript variants of this gene have been described, but their full-length nature is not known. NA
CNN1 ENSG00000130176 calponin 1 1264 NA NA
NLRP3 ENSG00000162711 NLR family pyrin domain containing 3 114548 This gene encodes a pyrin-like protein containing a pyrin domain, a nucleotide-binding site (NBS) domain, and a leucine-rich repeat (LRR) motif. This protein interacts with the apoptosis-associated speck-like protein PYCARD/ASC, which contains a caspase recruitment domain, and is a member of the NALP3 inflammasome complex. This complex functions as an upstream activator of NF-kappaB signaling, and it plays a role in the regulation of inflammation, the immune response, and apoptosis. Mutations in this gene are associated with familial cold autoinflammatory syndrome (FCAS), Muckle-Wells syndrome (MWS), chronic infantile neurological cutaneous and articular (CINCA) syndrome, and neonatal-onset multisystem inflammatory disease (NOMID). Multiple alternatively spliced transcript variants encoding distinct isoforms have been identified for this gene. Alternative 5’ UTR structures are suggested by available data; however, insufficient evidence is available to determine if all of the represented 5’ UTR splice patterns are biologically valid. NA
SLA ENSG00000155926 Src-like-adaptor 6503 NA NA
LTB ENSG00000227507 lymphotoxin beta 4050 Lymphotoxin beta is a type II membrane protein of the TNF family. It anchors lymphotoxin-alpha to the cell surface through heterotrimer formation. The predominant form on the lymphocyte surface is the lymphotoxin-alpha 1/beta 2 complex (e.g. 1 molecule alpha/2 molecules beta) and this complex is the primary ligand for the lymphotoxin-beta receptor. The minor complex is lymphotoxin-alpha 2/beta 1. LTB is an inducer of the inflammatory response system and involved in normal development of lymphoid tissue. Lymphotoxin-beta isoform b is unable to complex with lymphotoxin-alpha suggesting a function for lymphotoxin-beta which is independent of lympyhotoxin-alpha. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
ALB ENSG00000163631 albumin 213 Albumin is a soluble, monomeric protein which comprises about one-half of the blood serum protein. Albumin functions primarily as a carrier protein for steroids, fatty acids, and thyroid hormones and plays a role in stabilizing extracellular fluid volume. Albumin is a globular unglycosylated serum protein of molecular weight 65,000. Albumin is synthesized in the liver as preproalbumin which has an N-terminal peptide that is removed before the nascent protein is released from the rough endoplasmic reticulum. The product, proalbumin, is in turn cleaved in the Golgi vesicles to produce the secreted albumin. NA
TREM2 ENSG00000095970 triggering receptor expressed on myeloid cells 2 54209 This gene encodes a membrane protein that forms a receptor signaling complex with the TYRO protein tyrosine kinase binding protein. The encoded protein functions in immune response and may be involved in chronic inflammation by triggering the production of constitutive inflammatory cytokines. Defects in this gene are a cause of polycystic lipomembranous osteodysplasia with sclerosing leukoencephalopathy (PLOSL). Alternative splicing results in multiple transcript variants encoding different isoforms. NA
SOX9 ENSG00000125398 SRY-box 9 6662 The protein encoded by this gene recognizes the sequence CCTTGAG along with other members of the HMG-box class DNA-binding proteins. It acts during chondrocyte differentiation and, with steroidogenic factor 1, regulates transcription of the anti-Muellerian hormone (AMH) gene. Deficiencies lead to the skeletal malformation syndrome campomelic dysplasia, frequently with sex reversal. NA
GPX2 ENSG00000176153 glutathione peroxidase 2 2877 This gene is a member of the glutathione peroxidase family and encodes a selenium-dependent glutathione peroxidase that is one of two isoenzymes responsible for the majority of the glutathione-dependent hydrogen peroxide-reducing activity in the epithelium of the gastrointestinal tract. The protein encoded by this locus contains a selenocysteine (Sec) residue encoded by the UGA codon, which normally signals translation termination. Alternatively spliced transcript variants have been described. NA
SFTPA1 ENSG00000122852 surfactant protein A1 653509 This gene encodes a lung surfactant protein that is a member of a subfamily of C-type lectins called collectins. The encoded protein binds specific carbohydrate moieties found on lipids and on the surface of microorganisms. This protein plays an essential role in surfactant homeostasis and in the defense against respiratory pathogens. Mutations in this gene are associated with idiopathic pulmonary fibrosis. Alternate splicing results in multiple transcript variants. NA
LSR ENSG00000105699 lipolysis stimulated lipoprotein receptor 51599 NA NA
PCED1B-AS1 ENSG00000247774 PCED1B antisense RNA 1 100233209 NA NA
SLC7A7 ENSG00000155465 solute carrier family 7 member 7 9056 The protein encoded by this gene is the light subunit of a cationic amino acid transporter. This sodium-independent transporter is formed when the light subunit encoded by this gene dimerizes with the heavy subunit transporter protein SLC3A2. This transporter is found in epithelial cell membranes where it transfers cationic and large neutral amino acids from the cell to the extracellular space. Defects in this gene are a cause of lysinuric protein intolerance (LPI). Alternative splicing results in multiple transcript variants. NA
SPOCK2 ENSG00000107742 sparc/osteonectin, cwcv and kazal-like domains proteoglycan (testican) 2 9806 This gene encodes a protein which binds with glycosaminoglycans to form part of the extracellular matrix. The protein contains thyroglobulin type-1, follistatin-like, and calcium-binding domains, and has glycosaminoglycan attachment sites in the acidic C-terminal region. Three alternatively spliced transcript variants that encode different protein isoforms have been described for this gene. NA
CD52 ENSG00000169442 CD52 molecule 1043 NA NA
APOH ENSG00000091583 apolipoprotein H 350 Apolipoprotein H has been implicated in a variety of physiologic pathways including lipoprotein metabolism, coagulation, and the production of antiphospholipid autoantibodies. APOH may be a required cofactor for anionic phospholipid binding by the antiphospholipid autoantibodies found in sera of many patients with lupus and primary antiphospholipid syndrome, but it does not seem to be required for the reactivity of antiphospholipid autoantibodies associated with infections. NA
RP11-324O2.3 ENSG00000232934 NA ENSG00000232934 NA NA
ABRACL ENSG00000146386 ABRA C-terminal like 58527 NA NA
RAP1GAP ENSG00000076864 RAP1 GTPase activating protein 5909 This gene encodes a type of GTPase-activating-protein (GAP) that down-regulates the activity of the ras-related RAP1 protein. RAP1 acts as a molecular switch by cycling between an inactive GDP-bound form and an active GTP-bound form. The product of this gene, RAP1GAP, promotes the hydrolysis of bound GTP and hence returns RAP1 to the inactive state whereas other proteins, guanine nucleotide exchange factors (GEFs), act as RAP1 activators by facilitating the conversion of RAP1 from the GDP- to the GTP-bound form. In general, ras subfamily proteins, such as RAP1, play key roles in receptor-linked signaling pathways that control cell growth and differentiation. RAP1 plays a role in diverse processes such as cell proliferation, adhesion, differentiation, and embryogenesis. Alternative splicing results in multiple transcript variants encoding distinct proteins. NA
TNFRSF11A ENSG00000141655 tumor necrosis factor receptor superfamily member 11a 8792 The protein encoded by this gene is a member of the TNF-receptor superfamily. This receptors can interact with various TRAF family proteins, through which this receptor induces the activation of NF-kappa B and MAPK8/JNK. This receptor and its ligand are important regulators of the interaction between T cells and dendritic cells. This receptor is also an essential mediator for osteoclast and lymph node development. Mutations at this locus have been associated with familial expansile osteolysis, autosomal recessive osteopetrosis, and Paget disease of bone. Alternatively spliced transcript variants have been described for this locus. NA
RP11-1143G9.4 ENSG00000257764 NA ENSG00000257764 NA NA
PCP4 ENSG00000183036 Purkinje cell protein 4 5121 NA NA
FUT2 ENSG00000176920 fucosyltransferase 2 2524 The protein encoded by this gene is a Golgi stack membrane protein that is involved in the creation of a precursor of the H antigen, which is required for the final step in the soluble A and B antigen synthesis pathway. This gene is one of two encoding the galactoside 2-L-fucosyltransferase enzyme. Two transcript variants encoding the same protein have been found for this gene. NA
CORO2A ENSG00000106789 coronin 2A 7464 This gene encodes a member of the WD repeat protein family. WD repeats are minimally conserved regions of approximately 40 amino acids typically bracketed by gly-his and trp-asp (GH-WD), which may facilitate formation of heterotrimeric or multiprotein complexes. Members of this family are involved in a variety of cellular processes, including cell cycle progression, signal transduction, apoptosis, and gene regulation. This protein contains 5 WD repeats, and has a structural similarity with actin-binding proteins: the D. discoideum coronin and the human p57 protein, suggesting that this protein may also be an actin-binding protein that regulates cell motility. Alternative splicing of this gene generates 2 transcript variants. NA
CTB-191K22.5 ENSG00000267815 NA ENSG00000267815 NA NA
AKR7A3 ENSG00000162482 aldo-keto reductase family 7 member A3 22977 Aldo-keto reductases, such as AKR7A3, are involved in the detoxification of aldehydes and ketones. NA
KBTBD8 ENSG00000163376 kelch repeat and BTB domain containing 8 84541 NA NA
CD4 ENSG00000010610 CD4 molecule 920 This gene encodes a membrane glycoprotein of T lymphocytes that interacts with major histocompatibility complex class II antigenes and is also a receptor for the human immunodeficiency virus. This gene is expressed not only in T lymphocytes, but also in B cells, macrophages, and granulocytes. It is also expressed in specific regions of the brain. The protein functions to initiate or augment the early phase of T-cell activation, and may function as an important mediator of indirect neuronal damage in infectious and immune-mediated diseases of the central nervous system. Multiple alternatively spliced transcript variants encoding different isoforms have been identified in this gene. NA
CYBA ENSG00000051523 cytochrome b-245 alpha chain 1535 Cytochrome b is comprised of a light chain (alpha) and a heavy chain (beta). This gene encodes the light, alpha subunit which has been proposed as a primary component of the microbicidal oxidase system of phagocytes. Mutations in this gene are associated with autosomal recessive chronic granulomatous disease (CGD), that is characterized by the failure of activated phagocytes to generate superoxide, which is important for the microbicidal activity of these cells. NA
BCL2L15 ENSG00000188761 BCL2 like 15 440603 NA NA
LAPTM5 ENSG00000162511 lysosomal protein transmembrane 5 7805 This gene encodes a transmembrane receptor that is associated with lysosomes. The encoded protein, also known as E3 protein, may play a role in hematopoiesis. NA
KCNAB2 ENSG00000069424 potassium voltage-gated channel subfamily A regulatory beta subunit 2 8514 Voltage-gated potassium (Kv) channels represent the most complex class of voltage-gated ion channels from both functional and structural standpoints. Their diverse functions include regulating neurotransmitter release, heart rate, insulin secretion, neuronal excitability, epithelial electrolyte transport, smooth muscle contraction, and cell volume. Four sequence-related potassium channel genes - shaker, shaw, shab, and shal - have been identified in Drosophila, and each has been shown to have human homolog(s). This gene encodes a member of the potassium channel, voltage-gated, shaker-related subfamily. This member is one of the beta subunits, which are auxiliary proteins associating with functional Kv-alpha subunits. This member alters functional properties of the KCNA4 gene product. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. NA
ACVRL1 ENSG00000139567 activin A receptor like type 1 94 This gene encodes a type I cell-surface receptor for the TGF-beta superfamily of ligands. It shares with other type I receptors a high degree of similarity in serine-threonine kinase subdomains, a glycine- and serine-rich region (called the GS domain) preceding the kinase domain, and a short C-terminal tail. The encoded protein, sometimes termed ALK1, shares similar domain structures with other closely related ALK or activin receptor-like kinase proteins that form a subfamily of receptor serine/threonine kinases. Mutations in this gene are associated with hemorrhagic telangiectasia type 2, also known as Rendu-Osler-Weber syndrome 2. NA
KRT7 ENSG00000135480 keratin 7 3855 The protein encoded by this gene is a member of the keratin gene family. The type II cytokeratins consist of basic or neutral proteins which are arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple and stratified epithelial tissues. This type II cytokeratin is specifically expressed in the simple epithelia lining the cavities of the internal organs and in the gland ducts and blood vessels. The genes encoding the type II cytokeratins are clustered in a region of chromosome 12q12-q13. Alternative splicing may result in several transcript variants; however, not all variants have been fully described. NA
SELL ENSG00000188404 selectin L 6402 This gene encodes a cell surface adhesion molecule that belongs to a family of adhesion/homing receptors. The encoded protein contains a C-type lectin-like domain, a calcium-binding epidermal growth factor-like domain, and two short complement-like repeats. The gene product is required for binding and subsequent rolling of leucocytes on endothelial cells, facilitating their migration into secondary lymphoid organs and inflammation sites. Single-nucleotide polymorphisms in this gene have been associated with various diseases including immunoglobulin A nephropathy. Alternatively spliced transcript variants have been found for this gene. NA
RP11-867G23.8 ENSG00000255468 NA ENSG00000255468 NA NA
DOK3 ENSG00000146094 docking protein 3 79930 NA NA
SNX10 ENSG00000086300 sorting nexin 10 29887 This gene encodes a member of the sorting nexin family. Members of this family contain a phox (PX) domain, which is a phosphoinositide binding domain, and are involved in intracellular trafficking. This protein does not contain a coiled coil region, like some family members. This gene may play a role in regulating endosome homeostasis. Alternative splicing results in multiple transcript variants. NA
FYB ENSG00000082074 FYN binding protein 2533 The protein encoded by this gene is an adapter for the FYN protein and LCP2 signaling cascades in T-cells. The encoded protein is involved in platelet activation and controls the expression of interleukin-2. Three transcript variants encoding different isoforms have been found for this gene. NA
CTD-2020K17.4 ENSG00000233483 NA ENSG00000233483 NA NA
CALML5 ENSG00000178372 calmodulin like 5 51806 This gene encodes a novel calcium binding protein expressed in the epidermis and related to the calmodulin family of calcium binding proteins. Functional studies with recombinant protein demonstrate it does bind calcium and undergoes a conformational change when it does so. Abundant expression is detected only in reconstructed epidermis and is restricted to differentiating keratinocytes. In addition, it can associate with transglutaminase 3, shown to be a key enzyme in the terminal differentiation of keratinocytes. NA
TBX15 ENSG00000092607 T-box 15 6913 This gene belongs to the T-box family of genes, which encode a phylogenetically conserved family of transcription factors that regulate a variety of developmental processes. All these genes contain a common T-box DNA-binding domain. Mutations in this gene are associated with Cousin syndrome. NA
HAVCR2 ENSG00000135077 hepatitis A virus cellular receptor 2 84868 The protein encoded by this gene belongs to the immunoglobulin superfamily, and TIM family of proteins. CD4-positive T helper lymphocytes can be divided into types 1 (Th1) and 2 (Th2) on the basis of their cytokine secretion patterns. Th1 cells are involved in cell-mediated immunity to intracellular pathogens and delayed-type hypersensitivity reactions, whereas, Th2 cells are involved in the control of extracellular helminthic infections and the promotion of atopic and allergic diseases. This protein is a Th1-specific cell surface protein that regulates macrophage activation, and inhibits Th1-mediated auto- and alloimmune responses, and promotes immunological tolerance. NA
SFTPA2 ENSG00000185303 surfactant protein A2 729238 This gene is one of several genes encoding pulmonary-surfactant associated proteins (SFTPA) located on chromosome 10. Mutations in this gene and a highly similar gene located nearby, which affect the highly conserved carbohydrate recognition domain, are associated with idiopathic pulmonary fibrosis. The current version of the assembly displays only a single centromeric SFTPA gene pair rather than the two gene pairs shown in the previous assembly which were thought to have resulted from a duplication. NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",16,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 17 Annotations

out <- mygene::queryMany(gene_list[17,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name X_id query symbol summary notfound
ras homolog family member V 171177 ENSG00000104140 RHOV NA NA
serum amyloid A1 6288 ENSG00000173432 SAA1 This gene encodes a member of the serum amyloid A family of apolipoproteins. The encoded preproprotein is proteolytically processed to generate the mature protein. This protein is a major acute phase protein that is highly expressed in response to inflammation and tissue injury. This protein also plays an important role in HDL metabolism and cholesterol homeostasis. High levels of this protein are associated with chronic inflammatory diseases including atherosclerosis, rheumatoid arthritis, Alzheimer’s disease and Crohn’s disease. This protein may also be a potential biomarker for certain tumors. Alternate splicing results in multiple transcript variants that encode the same protein. A pseudogene of this gene is found on chromosome 11. NA
CD200 molecule 4345 ENSG00000091972 CD200 This gene encodes a type I membrane glycoprotein containing two extracellular immunoglobulin domains, a transmembrane and a cytoplasmic domain. This gene is expressed by various cell types, including B cells, a subset of T cells, thymocytes, endothelial cells, and neurons. The encoded protein plays an important role in immunosuppression and regulation of anti-tumor activity. Alternative splicing results in multiple transcript variants encoding different isoforms. NA
ankyrin repeat domain 22 118932 ENSG00000152766 ANKRD22 NA NA
leucine rich repeat containing 1 55227 ENSG00000137269 LRRC1 NA NA
NA ENSG00000271218 ENSG00000271218 RP3-523E19.2 NA NA
syntaxin 19 415117 ENSG00000178750 STX19 NA NA
delta(4)-desaturase, sphingolipid 2 123099 ENSG00000168350 DEGS2 This gene encodes a bifunctional enzyme that is involved in the biosynthesis of phytosphingolipids in human skin and in other phytosphingolipid-containing tissues. This enzyme can act as a sphingolipid delta(4)-desaturase, and also as a sphingolipid C4-hydroxylase. NA
NA ENSG00000271133 ENSG00000271133 CTA-293F17.1 NA NA
synaptotagmin like 1 84958 ENSG00000142765 SYTL1 NA NA
4-hydroxyphenylpyruvate dioxygenase 3242 ENSG00000158104 HPD The protein encoded by this gene is an enzyme in the catabolic pathway of tyrosine. The encoded protein catalyzes the conversion of 4-hydroxyphenylpyruvate to homogentisate. Defects in this gene are a cause of tyrosinemia type 3 (TYRO3) and hawkinsinuria (HAWK). Two transcript variants encoding different isoforms have been found for this gene. NA
SPARC like 1 8404 ENSG00000152583 SPARCL1 NA NA
chromosome 15 open reading frame 48 84419 ENSG00000166920 C15orf48 This gene was first identified in a study of human esophageal squamous cell carcinoma tissues. Levels of both the message and protein are reduced in carcinoma samples. In adult human tissues, this gene is expressed in the the esophagus, stomach, small intestine, colon and placenta. Alternatively spliced transcript variants that encode the same protein have been identified. NA
macrophage stimulating 1 receptor 4486 ENSG00000164078 MST1R This gene encodes a cell surface receptor for macrophage-stimulating protein (MSP) with tyrosine kinase activity. The mature form of this protein is a heterodimer of disulfide-linked alpha and beta subunits, generated by proteolytic cleavage of a single-chain precursor. The beta subunit undergoes tyrosine phosphorylation upon stimulation by MSP. This protein is expressed on the ciliated epithelia of the mucociliary transport apparatus of the lung, and together with MSP, thought to be involved in host defense. Alternative splicing generates multiple transcript variants encoding different isoforms that may undergo similar proteolytic processing. NA
calponin 1 1264 ENSG00000130176 CNN1 NA NA
junctophilin 1 56704 ENSG00000104369 JPH1 Junctional complexes between the plasma membrane and endoplasmic/sarcoplasmic reticulum are a common feature of all excitable cell types and mediate cross talk between cell surface and intracellular ion channels. The protein encoded by this gene is a component of junctional complexes and is composed of a C-terminal hydrophobic segment spanning the endoplasmic/sarcoplasmic reticulum membrane and a remaining cytoplasmic domain that shows specific affinity for the plasma membrane. This gene is a member of the junctophilin gene family. NA
NA NA ENSG00000205246 NA NA TRUE
NOTCH-regulated ankyrin repeat protein 441478 ENSG00000198435 NRARP NA NA
N-acetylglutamate synthase 162417 ENSG00000161653 NAGS The N-acetylglutamate synthase gene encodes a mitochondrial enzyme that catalyzes the formation of N-acetylglutamate (NAG) from glutamate and acetyl coenzyme-A. NAG is a cofactor of carbamyl phosphate synthetase I (CPSI), the first enzyme of the urea cycle in mammals. This gene may regulate ureagenesis by altering NAG availability and, thereby, CPSI activity. Deficiencies in N-acetylglutamate synthase have been associated with hyperammonemia. NA
apolipoprotein C1 341 ENSG00000130208 APOC1 This gene encodes a member of the apolipoprotein C1 family. This gene is expressed primarily in the liver, and it is activated when monocytes differentiate into macrophages. The encoded protein plays a central role in high density lipoprotein (HDL) and very low density lipoprotein (VLDL) metabolism. This protein has also been shown to inhibit cholesteryl ester transfer protein in plasma. A pseudogene of this gene is located 4 kb downstream in the same orientation, on the same chromosome. This gene is mapped to chromosome 19, where it resides within a apolipoprotein gene cluster. NA
formimidoyltransferase cyclodeaminase 10841 ENSG00000160282 FTCD The protein encoded by this gene is a bifunctional enzyme that channels 1-carbon units from formiminoglutamate, a metabolite of the histidine degradation pathway, to the folate pool. Mutations in this gene are associated with glutamate formiminotransferase deficiency. Alternatively spliced transcript variants have been found for this gene. NA
fatty acyl-CoA reductase 1 84188 ENSG00000197601 FAR1 The protein encoded by this gene is required for the reduction of fatty acids to fatty alcohols, a process that is required for the synthesis of monoesters and ether lipids. NADPH is required as a cofactor in this reaction, and 16-18 carbon saturated and unsaturated fatty acids are the preferred substrate. This is a peroxisomal membrane protein, and studies suggest that the N-terminus contains a large catalytic domain located on the outside of the peroxisome, while the C-terminus is exposed to the matrix of the peroxisome. Studies indicate that the regulation of this protein is dependent on plasmalogen levels. Mutations in this gene have been associated with individuals affected by severe intellectual disability, early-onset epilepsy, microcephaly, congenital cataracts, growth retardation, and spasticity (PMID: 25439727). A pseudogene of this gene is located on chromosome 13. NA
complement C1r subcomponent 715 ENSG00000159403 C1R NA NA
calsequestrin 2 845 ENSG00000118729 CASQ2 The protein encoded by this gene specifies the cardiac muscle family member of the calsequestrin family. Calsequestrin is localized to the sarcoplasmic reticulum in cardiac and slow skeletal muscle cells. The protein is a calcium binding protein that stores calcium for muscle function. Mutations in this gene cause stress-induced polymorphic ventricular tachycardia, also referred to as catecholaminergic polymorphic ventricular tachycardia 2 (CPVT2), a disease characterized by bidirectional ventricular tachycardia that may lead to cardiac arrest. NA
myelin protein zero like 2 10205 ENSG00000149573 MPZL2 Thymus development depends on a complex series of interactions between thymocytes and the stromal component of the organ. Epithelial V-like antigen (EVA) is expressed in thymus epithelium and strongly downregulated by thymocyte developmental progression. This gene is expressed in the thymus and in several epithelial structures early in embryogenesis. It is highly homologous to the myelin protein zero and, in thymus-derived epithelial cell lines, is poorly soluble in nonionic detergents, strongly suggesting an association to the cytoskeleton. Its capacity to mediate cell adhesion through a homophilic interaction and its selective regulation by T cell maturation might imply the participation of EVA in the earliest phases of thymus organogenesis. The protein bears a characteristic V-type domain and two potential N-glycosylation sites in the extracellular domain; a putative serine phosphorylation site for casein kinase 2 is also present in the cytoplasmic tail. Two transcript variants encoding the same protein have been found for this gene. NA
alpha tocopherol transfer protein like 79183 ENSG00000124120 TTPAL NA NA
KIT proto-oncogene receptor tyrosine kinase 3815 ENSG00000157404 KIT This gene encodes the human homolog of the proto-oncogene c-kit. C-kit was first identified as the cellular homolog of the feline sarcoma viral oncogene v-kit. This protein is a type 3 transmembrane receptor for MGF (mast cell growth factor, also known as stem cell factor). Mutations in this gene are associated with gastrointestinal stromal tumors, mast cell disease, acute myelogenous lukemia, and piebaldism. Multiple transcript variants encoding different isoforms have been found for this gene. NA
netrin 1 9423 ENSG00000065320 NTN1 Netrin is included in a family of laminin-related secreted proteins. The function of this gene has not yet been defined; however, netrin is thought to be involved in axon guidance and cell migration during development. Mutations and loss of expression of netrin suggest that variation in netrin may be involved in cancer development. NA
serum amyloid A2 6289 ENSG00000134339 SAA2 NA NA
orosomucoid 1 5004 ENSG00000229314 ORM1 This gene encodes a key acute phase plasma protein. Because of its increase due to acute inflammation, this protein is classified as an acute-phase reactant. The specific function of this protein has not yet been determined; however, it may be involved in aspects of immunosuppression. NA
serine peptidase inhibitor, Kunitz type 1 6692 ENSG00000166145 SPINT1 The protein encoded by this gene is a member of the Kunitz family of serine protease inhibitors. The protein is a potent inhibitor specific for HGF activator and is thought to be involved in the regulation of the proteolytic activation of HGF in injured tissues. Alternative splicing results in multiple variants encoding different isoforms. NA
CCAAT/enhancer binding protein beta 1051 ENSG00000172216 CEBPB This intronless gene encodes a transcription factor that contains a basic leucine zipper (bZIP) domain. The encoded protein functions as a homodimer but can also form heterodimers with CCAAT/enhancer-binding proteins alpha, delta, and gamma. Activity of this protein is important in the regulation of genes involved in immune and inflammatory responses, among other processes. The use of alternative in-frame AUG start codons results in multiple protein isoforms, each with distinct biological functions. NA
NA ENSG00000269918 ENSG00000269918 AF131215.9 NA NA
aldehyde oxidase 1 316 ENSG00000138356 AOX1 Aldehyde oxidase produces hydrogen peroxide and, under certain conditions, can catalyze the formation of superoxide. Aldehyde oxidase is a candidate gene for amyotrophic lateral sclerosis. NA
platelet derived growth factor subunit A 5154 ENSG00000197461 PDGFA This gene encodes a member of the protein family comprised of both platelet-derived growth factors (PDGF) and vascular endothelial growth factors (VEGF). The encoded preproprotein is proteolytically processed to generate platelet-derived growth factor subunit A, which can homodimerize, or alternatively, heterodimerize with the related platelet-derived growth factor subunit B. These proteins bind and activate PDGF receptor tyrosine kinases, which play a role in a wide range of developmental processes. Alternative splicing results in multiple transcript variants. NA
isovaleryl-CoA dehydrogenase 3712 ENSG00000128928 IVD Isovaleryl-CoA dehydrogenase (IVD) is a mitochondrial matrix enzyme that catalyzes the third step in leucine catabolism. The genetic deficiency of IVD results in an accumulation of isovaleric acid, which is toxic to the central nervous system and leads to isovaleric acidemia. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
amyloid beta precursor like protein 1 333 ENSG00000105290 APLP1 This gene encodes a member of the highly conserved amyloid precursor protein gene family. The encoded protein is a membrane-associated glycoprotein that is cleaved by secretases in a manner similar to amyloid beta A4 precursor protein cleavage. This cleavage liberates an intracellular cytoplasmic fragment that may act as a transcriptional activator. The encoded protein may also play a role in synaptic maturation during cortical development. Alternatively spliced transcript variants encoding different isoforms have been described. NA
NA NA ENSG00000241732 NA NA TRUE
protein disulfide isomerase family A member 2 64714 ENSG00000185615 PDIA2 Protein disulfide isomerases (EC 5.3.4.1), such as PDIP, are endoplasmic reticulum (ER) resident proteins that catalyze protein folding and thiol-disulfide interchange reactions (Desilva et al., 1996 [PubMed 8561901]). NA
prostaglandin E synthase 3 (cytosolic)-like 100885848 ENSG00000267060 PTGES3L NA NA
tandem C2 domains, nuclear 123036 ENSG00000165929 TC2N NA NA
uncharacterized LOC105378272 105378272 ENSG00000230555 LOC105378272 NA NA
cytochrome p450 oxidoreductase 5447 ENSG00000127948 POR This gene encodes an endoplasmic reticulum membrane oxidoreductase with an FAD-binding domain and a flavodoxin-like domain. The protein binds two cofactors, FAD and FMN, which allow it to donate electrons directly from NADPH to all microsomal P450 enzymes. Mutations in this gene have been associated with various diseases, including apparent combined P450C17 and P450C21 deficiency, amenorrhea and disordered steroidogenesis, congenital adrenal hyperplasia and Antley-Bixler syndrome. NA
major facilitator superfamily domain containing 4A 148808 ENSG00000174514 MFSD4A NA NA
regulating synaptic membrane exocytosis 3 9783 ENSG00000117016 RIMS3 NA NA
RNA binding motif protein 11 54033 ENSG00000185272 RBM11 NA NA
immunoglobulin heavy constant alpha 1 ENSG00000211895 ENSG00000211895 IGHA1 NA NA
haptoglobin 3240 ENSG00000257017 HP This gene encodes a preproprotein, which is processed to yield both alpha and beta chains, which subsequently combine as a tetramer to produce haptoglobin. Haptoglobin functions to bind free plasma hemoglobin, which allows degradative enzymes to gain access to the hemoglobin, while at the same time preventing loss of iron through the kidneys and protecting the kidneys from damage by hemoglobin. Mutations in this gene and/or its regulatory regions cause ahaptoglobinemia or hypohaptoglobinemia. This gene has also been linked to diabetic nephropathy, the incidence of coronary artery disease in type 1 diabetes, Crohn’s disease, inflammatory disease behavior, primary sclerosing cholangitis, susceptibility to idiopathic Parkinson’s disease, and a reduced incidence of Plasmodium falciparum malaria. The protein encoded also exhibits antimicrobial activity against bacteria. A similar duplicated gene is located next to this gene on chromosome 16. Multiple transcript variants encoding different isoforms have been found for this gene. NA
dual specificity phosphatase 23 54935 ENSG00000158716 DUSP23 NA NA
allograft inflammatory factor 1 like 83543 ENSG00000126878 AIF1L NA NA
tumor protein p73 7161 ENSG00000078900 TP73 This gene encodes a member of the p53 family of transcription factors involved in cellular responses to stress and development. It maps to a region on chromosome 1p36 that is frequently deleted in neuroblastoma and other tumors, and thought to contain multiple tumor suppressor genes. The demonstration that this gene is monoallelically expressed (likely from the maternal allele), supports the notion that it is a candidate gene for neuroblastoma. Many transcript variants resulting from alternative splicing and/or use of alternate promoters have been found for this gene, but the biological validity and the full-length nature of some variants have not been determined. NA
uncharacterized LOC101930370 101930370 ENSG00000245213 LOC101930370 NA NA
plakophilin 3 11187 ENSG00000184363 PKP3 This gene encodes a member of the arm-repeat (armadillo) and plakophilin gene families. Plakophilin proteins contain numerous armadillo repeats, localize to cell desmosomes and nuclei, and participate in linking cadherins to intermediate filaments in the cytoskeleton. This protein may act in cellular desmosome-dependent adhesion and signaling pathways. Two transcript variants encoding different isoforms have been found for this gene. NA
polypeptide N-acetylgalactosaminyltransferase 7 51809 ENSG00000109586 GALNT7 This gene encodes GalNAc transferase 7, a member of the GalNAc-transferase family. The enzyme encoded by this gene controls the initiation step of mucin-type O-linked protein glycosylation and transfer of N-acetylgalactosamine to serine and threonine amino acid residues. This enzyme is a type II transmembrane protein and shares common sequence motifs with other family members. Unlike other family members, this enzyme shows exclusive specificity for partially GalNAc-glycosylated acceptor substrates and shows no activity with non-glycosylated peptides. This protein may function as a follow-up enzyme in the initiation step of O-glycosylation. NA
neurexophilin 3 11248 ENSG00000182575 NXPH3 NA NA
steroidogenic acute regulatory protein 6770 ENSG00000147465 STAR The protein encoded by this gene plays a key role in the acute regulation of steroid hormone synthesis by enhancing the conversion of cholesterol into pregnenolone. This protein permits the cleavage of cholesterol into pregnenolone by mediating the transport of cholesterol from the outer mitochondrial membrane to the inner mitochondrial membrane. Mutations in this gene are a cause of congenital lipoid adrenal hyperplasia (CLAH), also called lipoid CAH. A pseudogene of this gene is located on chromosome 13. NA
hydroxysteroid 11-beta dehydrogenase 1 3290 ENSG00000117594 HSD11B1 The protein encoded by this gene is a microsomal enzyme that catalyzes the conversion of the stress hormone cortisol to the inactive metabolite cortisone. In addition, the encoded protein can catalyze the reverse reaction, the conversion of cortisone to cortisol. Too much cortisol can lead to central obesity, and a particular variation in this gene has been associated with obesity and insulin resistance in children. Mutations in this gene and H6PD (hexose-6-phosphate dehydrogenase (glucose 1-dehydrogenase)) are the cause of cortisone reductase deficiency. Alternate splicing results in multiple transcript variants encoding the same protein. NA
placenta specific 8 51316 ENSG00000145287 PLAC8 NA NA
insulin receptor substrate 2 8660 ENSG00000185950 IRS2 This gene encodes the insulin receptor substrate 2, a cytoplasmic signaling molecule that mediates effects of insulin, insulin-like growth factor 1, and other cytokines by acting as a molecular adaptor between diverse receptor tyrosine kinases and downstream effectors. The product of this gene is phosphorylated by the insulin receptor tyrosine kinase upon receptor stimulation, as well as by an interleukin 4 receptor-associated kinase in response to IL4 treatment. NA
NA NA ENSG00000273281 NA NA TRUE
ribosomal protein S3a pseudogene 47 ENSG00000205871 ENSG00000205871 RPS3AP47 NA NA
leucine zipper and EF-hand containing transmembrane protein 2 137994 ENSG00000165046 LETM2 NA NA
hexose-6-phosphate dehydrogenase/glucose 1-dehydrogenase 9563 ENSG00000049239 H6PD There are 2 forms of glucose-6-phosphate dehydrogenase. G form is X-linked and H form, encoded by this gene, is autosomally linked. This H form shows activity with other hexose-6-phosphates, especially galactose-6-phosphate, whereas the G form is specific for glucose-6-phosphate. Both forms are present in most tissues, but H form is not found in red cells. NA
atypical chemokine receptor 1 (Duffy blood group) 2532 ENSG00000213088 ACKR1 The protein encoded by this gene is a glycosylated membrane protein and a non-specific receptor for several chemokines. The encoded protein is the receptor for the human malarial parasites Plasmodium vivax and Plasmodium knowlesi. Polymorphisms in this gene are the basis of the Duffy blood group system. Two transcript variants encoding different isoforms have been found for this gene. NA
NA NA ENSG00000204807 NA NA TRUE
asialoglycoprotein receptor 1 432 ENSG00000141505 ASGR1 This gene encodes a subunit of the asialoglycoprotein receptor. This receptor is a transmembrane protein that plays a critical role in serum glycoprotein homeostasis by mediating the endocytosis and lysosomal degradation of glycoproteins with exposed terminal galactose or N-acetylgalactosamine residues. The asialoglycoprotein receptor may facilitate hepatic infection by multiple viruses including hepatitis B, and is also a target for liver-specific drug delivery. The asialoglycoprotein receptor is a hetero-oligomeric protein composed of major and minor subunits, which are encoded by different genes. The protein encoded by this gene is the more abundant major subunit. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
potassium calcium-activated channel subfamily N member 4 3783 ENSG00000104783 KCNN4 The protein encoded by this gene is part of a potentially heterotetrameric voltage-independent potassium channel that is activated by intracellular calcium. Activation is followed by membrane hyperpolarization, which promotes calcium influx. The encoded protein may be part of the predominant calcium-activated potassium channel in T-lymphocytes. This gene is similar to other KCNN family potassium channel genes, but it differs enough to possibly be considered as part of a new subfamily. NA
heparin binding EGF like growth factor 1839 ENSG00000113070 HBEGF NA NA
myelin protein zero like 3 196264 ENSG00000160588 MPZL3 NA NA
dysbindin domain containing 2 55861 ENSG00000244274 DBNDD2 NA NA
potassium two pore domain channel subfamily K member 1 3775 ENSG00000135750 KCNK1 This gene encodes one of the members of the superfamily of potassium channel proteins containing two pore-forming P domains. The product of this gene has not been shown to be a functional channel, however, it may require other non-pore-forming proteins for activity. NA
serine peptidase inhibitor, Kazal type 1 6690 ENSG00000164266 SPINK1 The protein encoded by this gene is a trypsin inhibitor, which is secreted from pancreatic acinar cells into pancreatic juice. It is thought to function in the prevention of trypsin-catalyzed premature activation of zymogens within the pancreas and the pancreatic duct. Mutations in this gene are associated with hereditary pancreatitis and tropical calcific pancreatitis. NA
NA ENSG00000248774 ENSG00000248774 RP11-798M19.3 NA NA
NA ENSG00000264924 ENSG00000264924 RP11-799B12.2 NA NA
protease, serine 3 5646 ENSG00000010438 PRSS3 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is expressed in the brain and pancreas and is resistant to common trypsin inhibitors. It is active on peptide linkages involving the carboxyl group of lysine or arginine. This gene is localized to the locus of T cell receptor beta variable orphans on chromosome 9. Four transcript variants encoding different isoforms have been described for this gene. NA
regenerating family member 1 beta 5968 ENSG00000172023 REG1B This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV based on the primary structures of the encoded proteins. This gene encodes a protein secreted by the exocrine pancreas that is highly similar to the REG1A protein. The related REG1A protein is associated with islet cell regeneration and diabetogenesis, and may be involved in pancreatic lithogenesis. Reg family members REG1A, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. NA
glycerol-3-phosphate acyltransferase, mitochondrial 57678 ENSG00000119927 GPAM This gene encodes a mitochondrial enzyme which prefers saturated fatty acids as its substrate for the synthesis of glycerolipids. This metabolic pathway’s first step is catalyzed by the encoded enzyme. Two forms for this enzyme exist, one in the mitochondria and one in the endoplasmic reticulum. Two alternatively spliced transcript variants have been described for this gene. NA
chymotrypsin like elastase family member 3B 23436 ENSG00000219073 CELA3B Elastases form a subfamily of serine proteases that hydrolyze many proteins in addition to elastin. Humans have six elastase genes which encode the structurally similar proteins elastase 1, 2, 2A, 2B, 3A, and 3B. Unlike other elastases, elastase 3B has little elastolytic activity. Like most of the human elastases, elastase 3B is secreted from the pancreas as a zymogen and, like other serine proteases such as trypsin, chymotrypsin and kallikrein, it has a digestive function in the intestine. Elastase 3B preferentially cleaves proteins after alanine residues. Elastase 3B may also function in the intestinal transport and metabolism of cholesterol. Both elastase 3A and elastase 3B have been referred to as protease E and as elastase 1, and excretion of this protein in fecal material is frequently used as a measure of pancreatic function in clinical assays. NA
regenerating family member 1 alpha 5967 ENSG00000115386 REG1A This gene is a type I subclass member of the Reg gene family. The Reg gene family is a multigene family grouped into four subclasses, types I, II, III and IV, based on the primary structures of the encoded proteins. This gene encodes a protein that is secreted by the exocrine pancreas. It is associated with islet cell regeneration and diabetogenesis and may be involved in pancreatic lithogenesis. Reg family members REG1B, REGL, PAP and this gene are tandemly clustered on chromosome 2p12 and may have arisen from the same ancestral gene by gene duplication. NA
lysozyme 4069 ENSG00000090382 LYZ This gene encodes human lysozyme, whose natural substrate is the bacterial cell wall peptidoglycan (cleaving the beta[1-4]glycosidic linkages between N-acetylmuramic acid and N-acetylglucosamine). Lysozyme is one of the antimicrobial agents found in human milk, and is also present in spleen, lung, kidney, white blood cells, plasma, saliva, and tears. The protein has antibacterial activity against a number of bacterial species. Missense mutations in this gene have been identified in heritable renal amyloidosis. NA
retinol binding protein 4 5950 ENSG00000138207 RBP4 This protein belongs to the lipocalin family and is the specific carrier for retinol (vitamin A alcohol) in the blood. It delivers retinol from the liver stores to the peripheral tissues. In plasma, the RBP-retinol complex interacts with transthyretin which prevents its loss by filtration through the kidney glomeruli. A deficiency of vitamin A blocks secretion of the binding protein posttranslationally and results in defective delivery and supply to the epidermal cells. NA
NA ENSG00000263335 ENSG00000263335 AF001548.5 NA NA
apolipoprotein C3 345 ENSG00000110245 APOC3 Apolipoprotein C-III is a very low density lipoprotein (VLDL) protein. APOC3 inhibits lipoprotein lipase and hepatic lipase; it is thought to delay catabolism of triglyceride-rich particles. The APOA1, APOC3 and APOA4 genes are closely linked in both rat and human genomes. The A-I and A-IV genes are transcribed from the same strand, while the A-1 and C-III genes are convergently transcribed. An increase in apoC-III levels induces the development of hypertriglyceridemia. NA
NA ENSG00000244021 ENSG00000244021 RP11-50D9.1 NA NA
mucin 1, cell surface associated 4582 ENSG00000185499 MUC1 This gene encodes a membrane-bound protein that is a member of the mucin family. Mucins are O-glycosylated proteins that play an essential role in forming protective mucous barriers on epithelial surfaces. These proteins also play a role in intracellular signaling. This protein is expressed on the apical surface of epithelial cells that line the mucosal surfaces of many different tissues including lung, breast stomach and pancreas. This protein is proteolytically cleaved into alpha and beta subunits that form a heterodimeric complex. The N-terminal alpha subunit functions in cell-adhesion and the C-terminal beta subunit is involved in cell signaling. Overexpression, aberrant intracellular localization, and changes in glycosylation of this protein have been associated with carcinomas. This gene is known to contain a highly polymorphic variable number tandem repeats (VNTR) domain. Alternate splicing results in multiple transcript variants. NA
zinc finger protein 738 ENSG00000172687 ENSG00000172687 ZNF738 NA NA
NA ENSG00000272512 ENSG00000272512 RP11-54O7.17 NA NA
WD repeat domain 76 79968 ENSG00000092470 WDR76 NA NA
joining chain of multimeric IgA and IgM 3512 ENSG00000132465 JCHAIN NA NA
activating transcription factor 5 22809 ENSG00000169136 ATF5 NA NA
SRY-box 4 6659 ENSG00000124766 SOX4 This intronless gene encodes a member of the SOX (SRY-related HMG-box) family of transcription factors involved in the regulation of embryonic development and in the determination of the cell fate. The encoded protein may act as a transcriptional regulator after forming a protein complex with other proteins, such as syndecan binding protein (syntenin). The protein may function in the apoptosis pathway leading to cell death as well as to tumorigenesis and may mediate downstream effects of parathyroid hormone (PTH) and PTH-related protein (PTHrP) in bone development. The solution structure has been resolved for the HMG-box of a similar mouse protein. NA
STE20-related kinase adaptor beta 55437 ENSG00000082146 STRADB This gene encodes a protein that belongs to the serine/threonine protein kinase STE20 subfamily. One of the active site residues in the protein kinase domain of this protein is altered, and it is thus a pseudokinase. This protein is a component of a complex involved in the activation of serine/threonine kinase 11, a master kinase that regulates cell polarity and energy-generating metabolism. This complex regulates the relocation of this kinase from the nucleus to the cytoplasm, and it is essential for G1 cell cycle arrest mediated by this kinase. The protein encoded by this gene can also interact with the X chromosome-linked inhibitor of apoptosis protein, and this interaction enhances the anti-apoptotic activity of this protein via the JNK1 signal transduction pathway. Two pseudogenes, located on chromosomes 1 and 7, have been found for this gene. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
vitronectin 7448 ENSG00000109072 VTN The protein encoded by this gene is a member of the pexin family. It is found in serum and tissues and promotes cell adhesion and spreading, inhibits the membrane-damaging effect of the terminal cytolytic complement pathway, and binds to several serpin serine protease inhibitors. It is a secreted protein and exists in either a single chain form or a clipped, two chain form held together by a disulfide bond. NA
threonine synthase like 2 55258 ENSG00000144115 THNSL2 This gene encodes a threonine synthase-like protein. A similar enzyme in mouse can catalyze the degradation of O-phospho-homoserine to a-ketobutyrate, phosphate, and ammonia. This protein also has phospho-lyase activity on both gamma and beta phosphorylated substrates. In mouse an alternatively spliced form of this protein has been shown to act as a cytokine and can induce the production of the inflammatory cytokine IL6 in osteoblasts. Alternate splicing results in multiple transcript variants. NA
protease, serine 1 5644 ENSG00000204983 PRSS1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. NA
sperm antigen with calponin homology and coiled-coil domains 1 92521 ENSG00000128487 SPECC1 The protein encoded by this gene belongs to the cytospin-A family. It is localized in the nucleus, and highly expressed in testis and some cancer cell lines. A chromosomal translocation involving this gene and platelet-derived growth factor receptor, beta gene (PDGFRB) may be a cause of juvenile myelomonocytic leukemia. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. NA
ribosomal protein L7 pseudogene 19 ENSG00000241458 ENSG00000241458 RPL7P19 NA NA
troponin T2, cardiac type 7139 ENSG00000118194 TNNT2 The protein encoded by this gene is the tropomyosin-binding subunit of the troponin complex, which is located on the thin filament of striated muscles and regulates muscle contraction in response to alterations in intracellular calcium ion concentration. Mutations in this gene have been associated with familial hypertrophic cardiomyopathy as well as with dilated cardiomyopathy. Transcripts for this gene undergo alternative splicing that results in many tissue-specific isoforms, however, the full-length nature of some of these variants has not yet been determined. NA
microsomal glutathione S-transferase 1 4257 ENSG00000008394 MGST1 The MAPEG (Membrane Associated Proteins in Eicosanoid and Glutathione metabolism) family consists of six human proteins, two of which are involved in the production of leukotrienes and prostaglandin E, important mediators of inflammation. Other family members, demonstrating glutathione S-transferase and peroxidase activities, are involved in cellular defense against toxic, carcinogenic, and pharmacologically active electrophilic compounds. This gene encodes a protein that catalyzes the conjugation of glutathione to electrophiles and the reduction of lipid hydroperoxides. This protein is localized to the endoplasmic reticulum and outer mitochondrial membrane where it is thought to protect these membranes from oxidative stress. Several transcript variants, some non-protein coding and some protein coding, have been found for this gene. NA
small nucleolar RNA host gene 12 85028 ENSG00000197989 SNHG12 NA NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",17,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 18 Annotations

out <- mygene::queryMany(gene_list[18,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
summary X_id query symbol name notfound
The protein encoded by this gene is a member of the transmembrane 4 superfamily, also known as the tetraspanin family. Most of these members are cell-surface proteins that are characterized by the presence of four hydrophobic domains. The proteins mediate signal transduction events that play a role in the regulation of cell development, activation, growth and motility. The use of alternate polyadenylation sites has been found for this gene. 23555 ENSG00000099282 TSPAN15 tetraspanin 15 NA
Tryptases comprise a family of trypsin-like serine proteases, the peptidase family S1. Tryptases are enzymatically active only as heparin-stabilized tetramers, and they are resistant to all known endogenous proteinase inhibitors. Several tryptase genes are clustered on chromosome 16p13.3. These genes are characterized by several distinct features. They have a highly conserved 3’ UTR and contain tandem repeat sequences at the 5’ flank and 3’ UTR which are thought to play a role in regulation of the mRNA stability. These genes have an intron immediately upstream of the initiator Met codon, which separates the site of transcription initiation from protein coding sequence. This feature is characteristic of tryptases but is unusual in other genes. The alleles of this gene exhibit an unusual amount of sequence variation, such that the alleles were once thought to represent two separate genes, alpha and beta 1. Beta tryptases appear to be the main isoenzymes expressed in mast cells; whereas in basophils, alpha tryptases predominate. Tryptases have been implicated as mediators in the pathogenesis of asthma and other allergic and inflammatory disorders. 7177 ENSG00000172236 TPSAB1 tryptase alpha/beta 1 NA
This gene encodes a preproprotein that is proteolytically processed to generate a secreted peptide that belongs to the endothelin/sarafotoxin family. This peptide is a potent vasoconstrictor and its cognate receptors are therapeutic targets in the treatment of pulmonary arterial hypertension. Aberrant expression of this gene may promote tumorigenesis. Alternative splicing results in multiple transcript variants. 1906 ENSG00000078401 EDN1 endothelin 1 NA
There are believed to be over 100 different glycosyltransferases involved in the synthesis of protein-bound and lipid-bound oligosaccharides. The enzyme encoded by this gene transfers a GlcNAc residue to the beta-linked mannose of the trimannosyl core of N-linked oligosaccharides and produces a bisecting GlcNAc. Multiple alternatively spliced variants, encoding the same protein, have been identified. 4248 ENSG00000128268 MGAT3 mannosyl (beta-1,4-)-glycoprotein beta-1,4-N-acetylglucosaminyltransferase NA
NA 114804 ENSG00000141576 RNF157 ring finger protein 157 NA
The protein encoded by this gene is a small secreted cysteine-rich protein and a member of the CCN family of regulatory proteins. CNN family proteins associate with the extracellular matrix and play an important role in cardiovascular and skeletal development, fibrosis and cancer development. 4856 ENSG00000136999 NOV nephroblastoma overexpressed NA
This gene encodes a secreted, homodimeric glycoprotein that is expressed in a wide variety of tissues and may have autocrine or paracrine functions. The encoded protein has 10 of its 15 cysteine residues conserved among stanniocalcin family members and is phosphorylated by casein kinase 2 exclusively on its serine residues. Its C-terminus contains a cluster of histidine residues which may interact with metal ions. The protein may play a role in the regulation of renal and intestinal calcium and phosphate transport, cell metabolism, or cellular calcium/phosphate homeostasis. Constitutive overexpression of human stanniocalcin 2 in mice resulted in pre- and postnatal growth restriction, reduced bone and skeletal muscle growth, and organomegaly. Expression of this gene is induced by estrogen and altered in some breast cancers. 8614 ENSG00000113739 STC2 stanniocalcin 2 NA
NA 11170 ENSG00000168309 FAM107A family with sequence similarity 107 member A NA
This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum. It has both 17alpha-hydroxylase and 17,20-lyase activities and is a key enzyme in the steroidogenic pathway that produces progestins, mineralocorticoids, glucocorticoids, androgens, and estrogens. Mutations in this gene are associated with isolated steroid-17 alpha-hydroxylase deficiency, 17-alpha-hydroxylase/17,20-lyase deficiency, pseudohermaphroditism, and adrenal hyperplasia. 1586 ENSG00000148795 CYP17A1 cytochrome P450 family 17 subfamily A member 1 NA
The adhesion G-protein-coupled receptors (GPCRs), including GPR133, are membrane-bound proteins with long N termini containing multiple domains. GPCRs, or GPRs, contain 7 transmembrane domains and transduce extracellular signals through heterotrimeric G proteins (summary by Bjarnadottir et al., 2004 [PubMed 15203201]). 283383 ENSG00000111452 ADGRD1 adhesion G protein-coupled receptor D1 NA
NA 100127888 ENSG00000232803 SLCO4A1-AS1 SLCO4A1 antisense RNA 1 NA
NA 28231 ENSG00000101187 SLCO4A1 solute carrier organic anion transporter family member 4A1 NA
FABP4 encodes the fatty acid binding protein found in adipocytes. Fatty acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. It is thought that FABPs roles include fatty acid uptake, transport, and metabolism. 2167 ENSG00000170323 FABP4 fatty acid binding protein 4 NA
NA ENSG00000254429 ENSG00000254429 CTD-2562J17.7 NA NA
NA ENSG00000272789 ENSG00000272789 RP11-286H15.1 NA NA
The protein encoded by this gene belongs to the centaurin gamma-like family. It mediates anti-apoptotic effects of nerve growth factor by activating nuclear phosphoinositide 3-kinase. It is overexpressed in cancer cells, and promotes cancer cell invasion. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. 116986 ENSG00000135439 AGAP2 ArfGAP with GTPase domain, ankyrin repeat and PH domain 2 NA
NA ENSG00000273055 ENSG00000273055 CTB-13F3.1 NA NA
STXBP6 binds components of the SNARE complex (see MIM 603215) and may be involved in regulating SNARE complex formation (Scales et al., 2002 [PubMed 12145319]). 29091 ENSG00000168952 STXBP6 syntaxin binding protein 6 NA
Cytochrome c oxidase (COX), the terminal enzyme of the mitochondrial respiratory chain, catalyzes the electron transfer from reduced cytochrome c to oxygen. It is a heteromeric complex consisting of 3 catalytic subunits encoded by mitochondrial genes and multiple structural subunits encoded by nuclear genes. The mitochondrially-encoded subunits function in electron transfer, and the nuclear-encoded subunits may be involved in the regulation and assembly of the complex. This nuclear gene encodes isoform 2 of subunit IV. Isoform 1 of subunit IV is encoded by a different gene, however, the two genes show a similar structural organization. Subunit IV is the largest nuclear encoded subunit which plays a pivotal role in COX regulation. 84701 ENSG00000131055 COX4I2 cytochrome c oxidase subunit 4I2 NA
NA ENSG00000267128 ENSG00000267128 RP11-449J21.5 NA NA
NA ENSG00000258999 ENSG00000258999 RP11-114N19.3 NA NA
This gene encodes a member of the hypoxia inducible gene 1 (HIG1) domain family. The encoded protein is localized to the cell membrane and has been linked to tumorigenesis and the progression of pituitary adenomas. Alternative splicing results in multiple transcript variants. 51751 ENSG00000131097 HIGD1B HIG1 hypoxia inducible domain family member 1B NA
NA 148808 ENSG00000174514 MFSD4A major facilitator superfamily domain containing 4A NA
This gene encodes a member of the cell death-inducing DNA fragmentation factor-like effector family. Members of this family play important roles in apoptosis. The encoded protein promotes lipid droplet formation in adipocytes and may mediate adipocyte apoptosis. This gene is regulated by insulin and its expression is positively correlated with insulin sensitivity. Mutations in this gene may contribute to insulin resistant diabetes. A pseudogene of this gene is located on the short arm of chromosome 3. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. 63924 ENSG00000187288 CIDEC cell death inducing DFFA like effector c NA
NA 100873993 ENSG00000239799 ITIH4-AS1 ITIH4 antisense RNA 1 NA
NA 80150 ENSG00000162174 ASRGL1 asparaginase like 1 NA
Integrins are heterodimers comprised of alpha and beta subunits, that are noncovalently associated transmembrane glycoprotein receptors. Different combinations of alpha and beta polypeptides form complexes that vary in their ligand-binding specificities. Integrins mediate cell-matrix or cell-cell adhesion, and transduced signals that regulate gene expression and cell growth. This gene encodes the integrin beta 4 subunit, a receptor for the laminins. This subunit tends to associate with alpha 6 subunit and is likely to play a pivotal role in the biology of invasive carcinoma. Mutations in this gene are associated with epidermolysis bullosa with pyloric atresia. Multiple alternatively spliced transcript variants encoding distinct isoforms have been found for this gene. 3691 ENSG00000132470 ITGB4 integrin subunit beta 4 NA
NA NA ENSG00000272016 NA NA TRUE
This gene encodes the alpha chain of type XVIII collagen. This collagen is one of the multiplexins, extracellular matrix proteins that contain multiple triple-helix domains (collagenous domains) interrupted by non-collagenous domains. A long isoform of the protein has an N-terminal domain that is homologous to the extracellular part of frizzled receptors. Proteolytic processing at several endogenous cleavage sites in the C-terminal domain results in production of endostatin, a potent antiangiogenic protein that is able to inhibit angiogenesis and tumor growth. Mutations in this gene are associated with Knobloch syndrome. The main features of this syndrome involve retinal abnormalities, so type XVIII collagen may play an important role in retinal structure and in neural tube closure. Alternative splicing results in multiple transcript variants. 80781 ENSG00000182871 COL18A1 collagen type XVIII alpha 1 chain NA
NA ENSG00000255126 ENSG00000255126 CTD-2531D15.5 NA NA
NA 221711 ENSG00000153157 SYCP2L synaptonemal complex protein 2 like NA
NA ENSG00000259352 ENSG00000259352 RP11-109D20.2 NA NA
This gene encodes a member of the protein family comprised of both platelet-derived growth factors (PDGF) and vascular endothelial growth factors (VEGF). The encoded preproprotein is proteolytically processed to generate platelet-derived growth factor subunit B, which can homodimerize, or alternatively, heterodimerize with the related platelet-derived growth factor subunit A. These proteins bind and activate PDGF receptor tyrosine kinases, which play a role in a wide range of developmental processes. Mutations in this gene are associated with meningioma. Reciprocal translocations between chromosomes 22 and 17, at sites where this gene and that for collagen type 1, alpha 1 are located, are associated with dermatofibrosarcoma protuberans, a rare skin tumor. Alternative splicing results in multiple transcript variants. 5155 ENSG00000100311 PDGFB platelet derived growth factor subunit B NA
NA ENSG00000255118 ENSG00000255118 RP11-703H8.7 NA NA
The protein encoded by this gene stimulates the activity of several transcription factors and nuclear receptors, including estrogen receptor alpha, nuclear respiratory factor 1, and glucocorticoid receptor. The encoded protein may be involved in fat oxidation, non-oxidative glucose metabolism, and the regulation of energy expenditure. This protein is downregulated in prediabetic and type 2 diabetes mellitus patients. Certain allelic variations in this gene increase the risk of the development of obesity. Three transcript variants encoding different isoforms have been found for this gene. 133522 ENSG00000155846 PPARGC1B PPARG coactivator 1 beta NA
NA 57558 ENSG00000118369 USP35 ubiquitin specific peptidase 35 NA
NA 150763 ENSG00000186281 GPAT2 glycerol-3-phosphate acyltransferase 2, mitochondrial NA
NA ENSG00000259479 ENSG00000259479 SORD2P sorbitol dehydrogenase 2, pseudogene NA
Major alterations in the composition of the cartilage extracellular matrix occur in joint disease, such as osteoarthrosis. This gene encodes the cartilage intermediate layer protein (CILP), which increases in early osteoarthrosis cartilage. The encoded protein was thought to encode a protein precursor for two different proteins; an N-terminal CILP and a C-terminal homolog of NTPPHase, however, later studies identified no nucleotide pyrophosphatase phosphodiesterase (NPP) activity. The full-length and the N-terminal domain of this protein was shown to function as an IGF-1 antagonist. An allelic variant of this gene has been associated with lumbar disc disease. 8483 ENSG00000138615 CILP cartilage intermediate layer protein NA
This gene encodes a bifunctional signal transduction molecule. Dopaminergic and glutamatergic receptor stimulation regulates its phosphorylation and function as a kinase or phosphatase inhibitor. As a target for dopamine, this gene may serve as a therapeutic target for neurologic and psychiatric disorders. Multiple transcript variants encoding different isoforms have been found for this gene. 84152 ENSG00000131771 PPP1R1B protein phosphatase 1 regulatory inhibitor subunit 1B NA
NA ENSG00000237276 ENSG00000237276 ANO7P1 anoctamin 7 pseudogene 1 NA
Members of the CELF/BRUNOL protein family contain two N-terminal RNA recognition motif (RRM) domains, one C-terminal RRM domain, and a divergent segment of 160-230 aa between the second and third RRM domains. Members of this protein family regulate pre-mRNA alternative splicing and may also be involved in mRNA editing, and translation. Alternative splicing results in multiple transcript variants encoding different isoforms. 10659 ENSG00000048740 CELF2 CUGBP, Elav-like family member 2 NA
NA ENSG00000259772 ENSG00000259772 RP11-16E12.2 NA NA
This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the mitochondrial inner membrane and is involved in the conversion of progesterone to cortisol in the adrenal cortex. Mutations in this gene cause congenital adrenal hyperplasia due to 11-beta-hydroxylase deficiency. Transcript variants encoding different isoforms have been noted for this gene. 1584 ENSG00000160882 CYP11B1 cytochrome P450 family 11 subfamily B member 1 NA
NA ENSG00000267396 ENSG00000267396 RP11-845C23.3 NA NA
The protein encoded by this gene is a member of the keratin gene family. The keratins are intermediate filament proteins responsible for the structural integrity of epithelial cells and are subdivided into cytokeratins and hair keratins. Most of the type I cytokeratins consist of acidic proteins which are arranged in pairs of heterotypic keratin chains and are clustered in a region of chromosome 17q12-q21. This keratin has been coexpressed with keratin 14 in a number of epithelial tissues, including esophagus, tongue, and hair follicles. Mutations in this gene are associated with type 1 pachyonychia congenita, non-epidermolytic palmoplantar keratoderma and unilateral palmoplantar verrucous nevus. 3868 ENSG00000186832 KRT16 keratin 16 NA
NA 359845 ENSG00000183688 FAM101B family with sequence similarity 101 member B NA
NA 5140 ENSG00000152270 PDE3B phosphodiesterase 3B NA
The protein encoded by this gene belongs to the innexin family. Innexin family members are the structural components of gap junctions. This protein and pannexin 1 are abundantly expressed in central nervous system (CNS) and are coexpressed in various neuronal populations. Studies in Xenopus oocytes suggest that this protein alone and in combination with pannexin 1 may form cell type-specific gap junctions with distinct properties. Multiple transcript variants encoding different isoforms have been found for this gene. 56666 ENSG00000073150 PANX2 pannexin 2 NA
This gene encodes a member of a family of proteins that function as negative regulators of Wnt receptor signaling through interaction with Dishevelled family members. The encoded protein participates in the delivery of transforming growth factor alpha-containing vesicles to the cell membrane. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 85409 ENSG00000145506 NKD2 naked cuticle homolog 2 NA
The protein encoded by this gene is a cytokine receptor that belongs to the interleukin 1 receptor family. This receptor specifically binds interleukin 18 (IL18), and is essential for IL18 mediated signal transduction. IFN-alpha and IL12 are reported to induce the expression of this receptor in NK and T cells. This gene along with four other members of the interleukin 1 receptor family, including IL1R2, IL1R1, ILRL2 (IL-1Rrp2), and IL1RL1 (T1/ST2), form a gene cluster on chromosome 2q. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. 8809 ENSG00000115604 IL18R1 interleukin 18 receptor 1 NA
NA 441869 ENSG00000235098 ANKRD65 ankyrin repeat domain 65 NA
NA ENSG00000225972 ENSG00000225972 MTND1P23 mitochondrially encoded NADH:ubiquinone oxidoreductase core subunit 1 pseudogene 23 NA
Steroid 5-alpha-reductase (EC 1.3.99.5) catalyzes the conversion of testosterone into the more potent androgen, dihydrotestosterone (DHT). Also see SRD5A2 (MIM 607306). 6715 ENSG00000145545 SRD5A1 steroid 5 alpha-reductase 1 NA
The protein encoded by this gene is a secretory protein that contains a hyaluronan-binding domain, and thus is a member of the hyaluronan-binding protein family. The hyaluronan-binding domain is known to be involved in extracellular matrix stability and cell migration. This protein has been shown to form a stable complex with inter-alpha-inhibitor (I alpha I), and thus enhance the serine protease inhibitory activity of I alpha I, which is important in the protease network associated with inflammation. This gene can be induced by proinflammatory cytokines such as tumor necrosis factor alpha and interleukin-1. Enhanced levels of this protein are found in the synovial fluid of patients with osteoarthritis and rheumatoid arthritis. 7130 ENSG00000123610 TNFAIP6 TNF alpha induced protein 6 NA
NA ENSG00000272077 ENSG00000272077 RP11-348P10.2 NA NA
This gene encodes a glycoprotein involved in hemostasis. The encoded preproprotein is proteolytically processed following assembly into large multimeric complexes. These complexes function in the adhesion of platelets to sites of vascular injury and the transport of various proteins in the blood. Mutations in this gene result in von Willebrand disease, an inherited bleeding disorder. An unprocessed pseudogene has been found on chromosome 22. 7450 ENSG00000110799 VWF von Willebrand factor NA
This gene is one of several genes encoding pulmonary-surfactant associated proteins (SFTPA) located on chromosome 10. Mutations in this gene and a highly similar gene located nearby, which affect the highly conserved carbohydrate recognition domain, are associated with idiopathic pulmonary fibrosis. The current version of the assembly displays only a single centromeric SFTPA gene pair rather than the two gene pairs shown in the previous assembly which were thought to have resulted from a duplication. 729238 ENSG00000185303 SFTPA2 surfactant protein A2 NA
The protein encoded by this gene is an adenosine receptor that belongs to the G-protein coupled receptor 1 family. There are 3 types of adenosine receptors, each with a specific pattern of ligand binding and tissue distribution, and together they regulate a diverse set of physiologic functions. The type A1 receptors inhibit adenylyl cyclase, and play a role in the fertilization process. Animal studies also suggest a role for A1 receptors in kidney function and ethanol intoxication. Transcript variants with alternative splicing in the 5’ UTR have been found for this gene. 134 ENSG00000163485 ADORA1 adenosine A1 receptor NA
This gene encodes a nuclear protein belonging to the hairy and enhancer of split-related (HESR) family of basic helix-loop-helix (bHLH)-type transcriptional repressors. Expression of this gene is induced by the Notch and c-Jun signal transduction pathways. Two similar and redundant genes in mouse are required for embryonic cardiovascular development, and are also implicated in neurogenesis and somitogenesis. Alternative splicing results in multiple transcript variants. 23462 ENSG00000164683 HEY1 hes related family bHLH transcription factor with YRPW motif 1 NA
NA 84541 ENSG00000163376 KBTBD8 kelch repeat and BTB domain containing 8 NA
NA 8503 ENSG00000117461 PIK3R3 phosphoinositide-3-kinase regulatory subunit 3 NA
NA 220963 ENSG00000165449 SLC16A9 solute carrier family 16 member 9 NA
NA NA ENSG00000257499 NA NA TRUE
Sorbitol dehydrogenase (SORD; EC 1.1.1.14) catalyzes the interconversion of polyols and their corresponding ketoses, and together with aldose reductase (ALDR1; MIM 103880), makes up the sorbitol pathway that is believed to play an important role in the development of diabetic complications (summarized by Carr and Markham, 1995 [PubMed 8535074]). The first reaction of the pathway (also called the polyol pathway) is the reduction of glucose to sorbitol by ALDR1 with NADPH as the cofactor. SORD then oxidizes the sorbitol to fructose using NAD(+) cofactor. 6652 ENSG00000140263 SORD sorbitol dehydrogenase NA
NA 400684 ENSG00000267213 LOC400684 uncharacterized LOC400684 NA
The protein encoded by this gene has a long and a short form, generated by use of alternative translational start codons. The long form is expressed in steroidogenic tissues such as testis, where it converts cholesteryl esters to free cholesterol for steroid hormone production. The short form is expressed in adipose tissue, among others, where it hydrolyzes stored triglycerides to free fatty acids. 3991 ENSG00000079435 LIPE lipase E, hormone sensitive type NA
This gene encodes a protein that contains several helicase family domains. Mutations in this gene have been found in some patients with the CHARGE syndrome. Two transcript variants encoding different isoforms have been found for this gene. 55636 ENSG00000171316 CHD7 chromodomain helicase DNA binding protein 7 NA
The protein encoded by this gene is a cell membrane protein that may be involved in iron export from duodenal epithelial cells. Defects in this gene are a cause of hemochromatosis type 4 (HFE4). 30061 ENSG00000138449 SLC40A1 solute carrier family 40 member 1 NA
NA ENSG00000217648 ENSG00000217648 RP1-95L4.4 NA NA
This gene encodes a member of a family of proteins that contain coiled-coil domains and may form hetero- or homomers. The encoded protein is involved in cell proliferation and calcium signaling. It also interacts with the mitogen-activated protein kinase kinase kinase 5 (MAP3K5/ASK1) and positively regulates MAP3K5-induced apoptosis. Multiple alternatively spliced transcript variants have been observed. 7164 ENSG00000111907 TPD52L1 tumor protein D52-like 1 NA
NA ENSG00000262663 ENSG00000262663 RP11-497H17.1 NA NA
This gene belongs to the short chain dehydrogenase/reductase superfamily. It encodes a reductase enzyme involved in the first step of wax biosynthesis wherein fatty acids are converted to fatty alcohols. The encoded peroxisomal protein utilizes saturated fatty acids of 16 or 18 carbons as preferred substrates. Alternatively spliced transcript variants have been observed for this gene. Related pseudogenes have been identified on chromosomes 2, 14 and 22. 55711 ENSG00000064763 FAR2 fatty acyl-CoA reductase 2 NA
This gene encodes the enzyme responsible for hydrolysis of both HIBYL-CoA and beta-hydroxypropionyl-CoA. Mutations in this gene have been associated with 3-hyroxyisobutyryl-CoA hydrolase deficiency. Alternative splicing results in multiple transcript variants. 26275 ENSG00000198130 HIBCH 3-hydroxyisobutyryl-CoA hydrolase NA
NA 113115 ENSG00000146410 MTFR2 mitochondrial fission regulator 2 NA
NA 55228 ENSG00000182013 PNMAL1 paraneoplastic Ma antigen family-like 1 NA
NA NA ENSG00000175898 NA NA TRUE
The protein encoded by this gene mediates sodium and chloride transport and reabsorption. The encoded protein is a membrane protein and is important in maintaining proper ionic balance and cell volume. This protein is phosphorylated in response to DNA damage. Three transcript variants encoding two different isoforms have been found for this gene. 6558 ENSG00000064651 SLC12A2 solute carrier family 12 member 2 NA
This gene encodes a member of the NipSnap family of proteins that may be involved in vesicular transport. A similar protein in mice inhibits the calcium channel TRPV6, and is also localized to the inner mitochondrial membrane where it may play a role in mitochondrial DNA maintenance. A pseudogene of this gene is located on the short arm of chromosome 17. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 8508 ENSG00000184117 NIPSNAP1 nipsnap homolog 1 (C. elegans) NA
This gene encodes a member of the tumor necrosis factor receptor superfamily. The encoded protein activates nuclear factor kappa-B and mitogen-activated protein kinase 8 (also called c-Jun N-terminal kinase 1), and induces cell apoptosis. Through its death domain, the encoded receptor interacts with tumor necrosis factor receptor type 1-associated death domain (TRADD) protein, which is known to mediate signal transduction of tumor necrosis factor receptors. Knockout studies in mice suggest that this gene plays a role in T-helper cell activation, and may be involved in inflammation and immune regulation. 27242 ENSG00000146072 TNFRSF21 tumor necrosis factor receptor superfamily member 21 NA
NA ENSG00000272668 ENSG00000272668 RP11-190A12.8 NA NA
NA ENSG00000250899 ENSG00000250899 RP11-253E3.3 NA NA
NA 28978 ENSG00000096092 TMEM14A transmembrane protein 14A NA
NA 84866 ENSG00000149582 TMEM25 transmembrane protein 25 NA
This gene encodes a member of the Nedd4 family of HECT domain E3 ubiquitin ligases. HECT domain E3 ubiquitin ligases transfer ubiquitin from E2 ubiquitin-conjugating enzymes to protein substrates, thus targeting specific proteins for lysosomal degradation. The encoded protein mediates the ubiquitination of multiple target substrates and plays a critical role in epithelial sodium transport by regulating the cell surface expression of the epithelial sodium channel, ENaC. Single nucleotide polymorphisms in this gene may be associated with essential hypertension. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 23327 ENSG00000049759 NEDD4L neural precursor cell expressed, developmentally down-regulated 4-like, E3 ubiquitin protein ligase NA
Histones are basic nuclear proteins that are responsible for the nucleosome structure of the chromosomal fiber in eukaryotes. Nucleosomes consist of approximately 146 bp of DNA wrapped around a histone octamer composed of pairs of each of the four core histones (H2A, H2B, H3, and H4). The chromatin fiber is further compacted through the interaction of a linker histone, H1, with the DNA between the nucleosomes to form higher order chromatin structures. This gene is intronless and encodes a replication-dependent histone that is a member of the histone H2B family. Two transcripts that encode the same protein have been identified for this gene, which is found in the large histone gene cluster on chromosome 6p22-p21.3. 3017 ENSG00000158373 HIST1H2BD histone cluster 1, H2bd NA
NA ENSG00000256072 ENSG00000256072 RP11-335I12.2 NA NA
IGSF4B is a brain-specific protein related to the calcium-independent cell-cell adhesion molecules known as nectins (see PVRL3; MIM 607147) (Kakunaga et al., 2005 [PubMed 15741237]). 57863 ENSG00000162706 CADM3 cell adhesion molecule 3 NA
NA ENSG00000251196 ENSG00000251196 RP11-54F2.1 NA NA
The protein encoded by this gene belongs to the thrombospondin protein family. Thrombospondin family members are adhesive glycoproteins that mediate cell-to-cell and cell-to-matrix interactions. This protein forms a pentamer and can bind to heparin and calcium. It is involved in local signaling in the developing and adult nervous system, and it contributes to spinal sensitization and neuropathic pain states. This gene is activated during the stromal response to invasive breast cancer. It may also play a role in inflammatory responses in Alzheimer’s disease. Alternative splicing results in multiple transcript variants. 7060 ENSG00000113296 THBS4 thrombospondin 4 NA
RGMB is a glycosylphosphatidylinositol (GPI)-anchored member of the repulsive guidance molecule family (see RGMA, MIM 607362) and contributes to the patterning of the developing nervous system (Samad et al., 2005 [PubMed 15671031]). 285704 ENSG00000174136 RGMB repulsive guidance molecule family member b NA
This gene encodes a member of the KH-domain protein subfamily. Proteins of this subfamily, also referred to as alpha-CPs, bind to RNA with a specificity for C-rich pyrimidine regions. Alpha-CPs play important roles in post-transcriptional activities and have different cellular distributions. This gene’s protein is found in the cytoplasm, yet it lacks the nuclear localization signals found in other subfamily members. Alternative splicing results in multiple transcript variants encoding distinct isoforms. 54039 ENSG00000183570 PCBP3 poly(rC) binding protein 3 NA
NA ENSG00000261428 ENSG00000261428 RP11-16P6.1 NA NA
Tight junctions represent one mode of cell-to-cell adhesion in epithelial or endothelial cell sheets, forming continuous seals around cells and serving as a physical barrier to prevent solutes and water from passing freely through the paracellular space. These junctions are comprised of sets of continuous networking strands in the outwardly facing cytoplasmic leaflet, with complementary grooves in the inwardly facing extracytoplasmic leaflet. The protein encoded by this gene, a member of the claudin family, is an integral membrane protein and a component of tight junction strands. Loss of function mutations result in neonatal ichthyosis-sclerosing cholangitis syndrome. 9076 ENSG00000163347 CLDN1 claudin 1 NA
NA 100289388 ENSG00000246174 KCTD21-AS1 KCTD21 antisense RNA 1 NA
NA ENSG00000271833 ENSG00000271833 RP11-356B19.11 NA NA
The protein encoded by this gene is a member of the ros/insulin receptor family of tyrosine kinases. Tyrosine-specific phosphorylation of proteins is a key to the control of diverse pathways leading to cell growth and differentiation. Multiple transcript variants encoding different isoforms have been found for this gene. 4058 ENSG00000062524 LTK leukocyte receptor tyrosine kinase NA
The protein encoded by this gene is a cis-Golgi transmembrane protein that may be necessary for the long-term survival of nociceptive and autonomic ganglion neurons. Mutations in this gene are a cause of hereditary sensory and autonomic neuropathy type IIB (HSAN IIB), and this gene may also play a role in susceptibility to vascular dementia. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. 54463 ENSG00000154153 FAM134B family with sequence similarity 134 member B NA
This gene encodes a subunit of a cytokine that acts on T and natural killer cells, and has a broad array of biological activities. The cytokine is a disulfide-linked heterodimer composed of the 35-kD subunit encoded by this gene, and a 40-kD subunit that is a member of the cytokine receptor family. This cytokine is required for the T-cell-independent induction of interferon (IFN)-gamma, and is important for the differentiation of both Th1 and Th2 cells. The responses of lymphocytes to this cytokine are mediated by the activator of transcription protein STAT4. Nitric oxide synthase 2A (NOS2A/NOS2) is found to be required for the signaling process of this cytokine in innate immunity. 3592 ENSG00000168811 IL12A interleukin 12A NA
This gene encodes a member of the F-box protein family, members of which are characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination. The F-box proteins are divided into three classes: Fbws containing WD-40 domains, Fbls containing leucine-rich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbx class. Multiple transcript variants encoding different isoforms have been found for this gene. 157574 ENSG00000214050 FBXO16 F-box protein 16 NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",18,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 19 Annotations

out <- mygene::queryMany(gene_list[19,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
name X_id symbol query summary notfound
myosin light chain, phosphorylatable, fast skeletal muscle 29895 MYLPF ENSG00000180209 NA NA
troponin I2, fast skeletal type 7136 TNNI2 ENSG00000130598 This gene encodes a fast-twitch skeletal muscle protein, a member of the troponin I gene family, and a component of the troponin complex including troponin T, troponin C and troponin I subunits. The troponin complex, along with tropomyosin, is responsible for the calcium-dependent regulation of striated muscle contraction. Mouse studies show that this component is also present in vascular smooth muscle and may play a role in regulation of smooth muscle function. In addition to muscle tissues, this protein is found in corneal epithelium, cartilage where it is an inhibitor of angiogenesis to inhibit tumor growth and metastasis, and mammary gland where it functions as a co-activator of estrogen receptor-related receptor alpha. This protein also suppresses tumor growth in human ovarian carcinoma. Mutations in this gene cause myopathy and distal arthrogryposis type 2B. Alternatively spliced transcript variants have been found for this gene. NA
myosin, heavy chain 1, skeletal muscle, adult 4619 MYH1 ENSG00000109061 Myosin is a major contractile protein which converts chemical energy into mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 different myosin heavy chain (MYH) isoforms have been described from striated, smooth, and nonmuscle cells. These isoforms show expression that is spatially and temporally regulated during development. NA
protein phosphatase 1 regulatory subunit 27 116729 PPP1R27 ENSG00000182676 NA NA
cerebral dopamine neurotrophic factor 441549 CDNF ENSG00000185267 NA NA
myosin light chain 1 4632 MYL1 ENSG00000168530 Myosin is a hexameric ATPase cellular motor protein. It is composed of two heavy chains, two nonphosphorylatable alkali light chains, and two phosphorylatable regulatory light chains. This gene encodes a myosin alkali light chain expressed in fast skeletal muscle. Two transcript variants have been identified for this gene. NA
NA ENSG00000260500 CTD-3193O13.1 ENSG00000260500 NA NA
transforming growth factor beta receptor 3 like 100507588 TGFBR3L ENSG00000260001 NA NA
ADAM metallopeptidase domain 19 8728 ADAM19 ENSG00000135074 This gene encodes a member of the ADAM (a disintegrin and metalloprotease domain) family. Members of this family are membrane-anchored proteins structurally related to snake venom disintegrins and have been implicated in a variety of biological processes involving cell-cell and cell-matrix interactions, including fertilization, muscle development, and neurogenesis. This member is a type I transmembrane protein and serves as a marker for dendritic cell differentiation. It has been demonstrated to be an active metalloproteinase, which may be involved in normal physiological processes such as cell migration, cell adhesion, cell-cell and cell-matrix interactions, and signal transduction. It is proposed to play a role in pathological processes, such as cancer, inflammatory diseases, renal diseases, and Alzheimer’s disease. NA
myosin binding protein C, fast type 4606 MYBPC2 ENSG00000086967 This gene encodes a member of the myosin-binding protein C family. This family includes the fast-, slow- and cardiac-type isoforms, each of which is a myosin-associated protein found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The protein encoded by this locus is referred to as the fast-type isoform. Mutations in the related but distinct genes encoding the slow-type and cardiac-type isoforms have been associated with distal arthrogryposis, type 1 and hypertrophic cardiomyopathy, respectively. NA
ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 1 487 ATP2A1 ENSG00000196296 This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol to the sarcoplasmic reticulum lumen, and is involved in muscular excitation and contraction. Mutations in this gene cause some autosomal recessive forms of Brody disease, characterized by increasing impairment of muscular relaxation during exercise. Alternative splicing results in three transcript variants encoding different isoforms. NA
family with sequence similarity 83 member D 81610 FAM83D ENSG00000101447 NA NA
CA3 antisense RNA 1 100996348 CA3-AS1 ENSG00000253549 NA NA
uncharacterized LOC100507537 100507537 LOC100507537 ENSG00000240045 NA NA
carbonic anhydrase 3 761 CA3 ENSG00000164879 Carbonic anhydrase III (CAIII) is a member of a multigene family (at least six separate genes are known) that encodes carbonic anhydrase isozymes. These carbonic anhydrases are a class of metalloenzymes that catalyze the reversible hydration of carbon dioxide and are differentially expressed in a number of cell types. The expression of the CA3 gene is strictly tissue specific and present at high levels in skeletal muscle and much lower levels in cardiac and smooth muscle. A proportion of carriers of Duchenne muscle dystrophy have a higher CA3 level than normal. The gene spans 10.3 kb and contains seven exons and six introns. NA
nebulin 4703 NEB ENSG00000183091 This gene encodes nebulin, a giant protein component of the cytoskeletal matrix that coexists with the thick and thin filaments within the sarcomeres of skeletal muscle. In most vertebrates, nebulin accounts for 3 to 4% of the total myofibrillar protein. The encoded protein contains approximately 30-amino acid long modules that can be classified into 7 types and other repeated modules. Protein isoform sizes vary from 600 to 800 kD due to alternative splicing that is tissue-, species-,and developmental stage-specific. Of the 183 exons in the nebulin gene, at least 43 are alternatively spliced, although exons 143 and 144 are not found in the same transcript. Of the several thousand transcript variants predicted for nebulin, the RefSeq Project has decided to create three representative RefSeq records. Mutations in this gene are associated with recessive nemaline myopathy. NA
calcium voltage-gated channel auxiliary subunit beta 2 783 CACNB2 ENSG00000165995 This gene encodes a subunit of a voltage-dependent calcium channel protein that is a member of the voltage-gated calcium channel superfamily. The gene product was originally identified as an antigen target in Lambert-Eaton myasthenic syndrome, an autoimmune disorder. Mutations in this gene are associated with Brugada syndrome. Alternatively spliced variants encoding different isoforms have been described. NA
troponin C2, fast skeletal type 7125 TNNC2 ENSG00000101470 Troponin (Tn), a key protein complex in the regulation of striated muscle contraction, is composed of 3 subunits. The Tn-I subunit inhibits actomyosin ATPase, the Tn-T subunit binds tropomyosin and Tn-C, while the Tn-C subunit binds calcium and overcomes the inhibitory action of the troponin complex on actin filaments. The protein encoded by this gene is the Tn-C subunit. NA
myosin binding protein C, slow type 4604 MYBPC1 ENSG00000196091 This gene encodes a member of the myosin-binding protein C family. Myosin-binding protein C family members are myosin-associated proteins found in the cross-bridge-bearing zone (C region) of A bands in striated muscle. The encoded protein is the slow skeletal muscle isoform of myosin-binding protein C and plays an important role in muscle contraction by recruiting muscle-type creatine kinase to myosin filaments. Mutations in this gene are associated with distal arthrogryposis type I. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
actinin alpha 1 87 ACTN1 ENSG00000072110 Alpha actinins belong to the spectrin gene superfamily which represents a diverse group of cytoskeletal proteins, including the alpha and beta spectrins and dystrophins. Alpha actinin is an actin-binding protein with multiple roles in different cell types. In nonmuscle cells, the cytoskeletal isoform is found along microfilament bundles and adherens-type junctions, where it is involved in binding actin to the membrane. In contrast, skeletal, cardiac, and smooth muscle isoforms are localized to the Z-disc and analogous dense bodies, where they help anchor the myofibrillar actin filaments. This gene encodes a nonmuscle, cytoskeletal, alpha actinin isoform and maps to the same site as the structurally similar erythroid beta spectrin gene. Three transcript variants encoding different isoforms have been found for this gene. NA
ATPase plasma membrane Ca2+ transporting 4 493 ATP2B4 ENSG00000058668 The protein encoded by this gene belongs to the family of P-type primary ion transport ATPases characterized by the formation of an aspartyl phosphate intermediate during the reaction cycle. These enzymes remove bivalent calcium ions from eukaryotic cells against very large concentration gradients and play a critical role in intracellular calcium homeostasis. The mammalian plasma membrane calcium ATPase isoforms are encoded by at least four separate genes and the diversity of these enzymes is further increased by alternative splicing of transcripts. The expression of different isoforms and splice variants is regulated in a developmental, tissue- and cell type-specific manner, suggesting that these pumps are functionally adapted to the physiological needs of particular cells and tissues. This gene encodes the plasma membrane calcium ATPase isoform 4. Alternatively spliced transcript variants encoding different isoforms have been identified. NA
troponin I1, slow skeletal type 7135 TNNI1 ENSG00000159173 Troponin proteins associate with tropomyosin and regulate the calcium sensitivity of the myofibril contractile apparatus of striated muscles. Troponin I (TnI), along with troponin T (TnT) and troponin C (TnC), is one of 3 subunits that form the troponin complex of the thin filaments of striated muscle. TnI is the inhibitory subunit; blocking actin-myosin interactions and thereby mediating striated muscle relaxation. The TnI subfamily contains three genes: TnI-skeletal-fast-twitch, TnI-skeletal-slow-twitch, and TnI-cardiac. The TnI-fast and TnI-slow genes are expressed in fast-twitch and slow-twitch skeletal muscle fibers, respectively, while the TnI-cardiac gene is expressed exclusively in cardiac muscle tissue. This gene encodes the Troponin-I-skeletal-slow-twitch protein. This gene is expressed in cardiac and skeletal muscle during early development but is restricted to slow-twitch skeletal muscle fibers in adults. The encoded protein prevents muscle contraction by inhibiting calcium-mediated conformational changes in actin-myosin complexes. NA
SH3 and cysteine rich domain 3 246329 STAC3 ENSG00000185482 The protein encoded by this gene is a component of the excitation-contraction coupling machinery of muscles. This protein is a member of the Stac gene family and contains an N-terminal cysteine-rich domain and two SH3 domains. Mutations in this gene are a cause of Native American myopathy. NA
myosin, heavy chain 2, skeletal muscle, adult 4620 MYH2 ENSG00000125414 Myosins are actin-based motor proteins that function in the generation of mechanical force in eukaryotic cells. Muscle myosins are heterohexamers composed of 2 myosin heavy chains and 2 pairs of nonidentical myosin light chains. This gene encodes a member of the class II or conventional myosin heavy chains, and functions in skeletal muscle contraction. This gene is found in a cluster of myosin heavy chain genes on chromosome 17. A mutation in this gene results in inclusion body myopathy-3. Multiple alternatively spliced variants, encoding the same protein, have been identified. NA
G protein-coupled receptor 162 27239 GPR162 ENSG00000250510 This gene was identified upon genomic analysis of a gene-dense region at human chromosome 12p13. It appears to be mainly expressed in the brain; however, its function is not known. Alternatively spliced transcript variants encoding different isoforms have been identified. NA
NA ENSG00000257261 RP11-96H19.1 ENSG00000257261 NA NA
long intergenic non-protein coding RNA 1372 101929736 LINC01372 ENSG00000235475 NA NA
SPARC related modular calcium binding 1 64093 SMOC1 ENSG00000198732 This gene encodes a multi-domain secreted protein that may have a critical role in ocular and limb development. Mutations in this gene are associated with microphthalmia and limb anomalies. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
keratin 8 3856 KRT8 ENSG00000170421 This gene is a member of the type II keratin family clustered on the long arm of chromosome 12. Type I and type II keratins heteropolymerize to form intermediate-sized filaments in the cytoplasm of epithelial cells. The product of this gene typically dimerizes with keratin 18 to form an intermediate filament in simple single-layered epithelial cells. This protein plays a role in maintaining cellular structural integrity and also functions in signal transduction and cellular differentiation. Mutations in this gene cause cryptogenic cirrhosis. Alternatively spliced transcript variants have been found for this gene. NA
family with sequence similarity 46 member B 115572 FAM46B ENSG00000158246 NA NA
NA ENSG00000232220 AC008440.5 ENSG00000232220 NA NA
myeloid-associated differentiation marker 91663 MYADM ENSG00000179820 NA NA
cold shock domain containing C2 27254 CSDC2 ENSG00000172346 NA NA
aldolase, fructose-bisphosphate C 230 ALDOC ENSG00000109107 This gene encodes a member of the class I fructose-biphosphate aldolase gene family. Expressed specifically in the hippocampus and Purkinje cells of the brain, the encoded protein is a glycolytic enzyme that catalyzes the reversible aldol cleavage of fructose-1,6-biphosphate and fructose 1-phosphate to dihydroxyacetone phosphate and either glyceraldehyde-3-phosphate or glyceraldehyde, respectively. NA
smoothelin 6525 SMTN ENSG00000183963 This gene encodes a structural protein that is found exclusively in contractile smooth muscle cells. It associates with stress fibers and constitutes part of the cytoskeleton. This gene is localized to chromosome 22q12.3, distal to the TUPLE1 locus and outside the DiGeorge syndrome deletion. Alternative splicing of this gene results in multiple transcript variants encoding distinct isoforms. NA
ryanodine receptor 1 6261 RYR1 ENSG00000196218 This gene encodes a ryanodine receptor found in skeletal muscle. The encoded protein functions as a calcium release channel in the sarcoplasmic reticulum but also serves to connect the sarcoplasmic reticulum and transverse tubule. Mutations in this gene are associated with malignant hyperthermia susceptibility, central core disease, and minicore myopathy with external ophthalmoplegia. Alternatively spliced transcripts encoding different isoforms have been described. NA
calmodulin like 6 163688 CALML6 ENSG00000169885 NA NA
zyxin 7791 ZYX ENSG00000159840 Focal adhesions are actin-rich structures that enable cells to adhere to the extracellular matrix and at which protein complexes involved in signal transduction assemble. Zyxin is a zinc-binding phosphoprotein that concentrates at focal adhesions and along the actin cytoskeleton. Zyxin has an N-terminal proline-rich domain and three LIM domains in its C-terminal half. The proline-rich domain may interact with SH3 domains of proteins involved in signal transduction pathways while the LIM domains are likely involved in protein-protein binding. Zyxin may function as a messenger in the signal transduction pathway that mediates adhesion-stimulated changes in gene expression and may modulate the cytoskeletal organization of actin bundles. Alternative splicing results in multiple transcript variants that encode the same isoform. NA
CACNA1C antisense RNA 2 100874235 CACNA1C-AS2 ENSG00000256271 NA NA
prostaglandin F2 receptor inhibitor 5738 PTGFRN ENSG00000134247 NA NA
KIAA1217 56243 KIAA1217 ENSG00000120549 NA NA
phospholipase A2 group V 5322 PLA2G5 ENSG00000127472 This gene is a member of the secretory phospholipase A2 family. It is located in a tightly-linked cluster of secretory phospholipase A2 genes on chromosome 1. The encoded enzyme catalyzes the hydrolysis of membrane phospholipids to generate lysophospholipids and free fatty acids including arachidonic acid. It preferentially hydrolyzes linoleoyl-containing phosphatidylcholine substrates. Secretion of this enzyme is thought to induce inflammatory responses in neighboring cells. Alternatively spliced transcript variants have been found, but their full-length nature has not been determined. NA
apolipoprotein D 347 APOD ENSG00000189058 This gene encodes a component of high density lipoprotein that has no marked similarity to other apolipoprotein sequences. It has a high degree of homology to plasma retinol-binding protein and other members of the alpha 2 microglobulin protein superfamily of carrier proteins, also known as lipocalins. This glycoprotein is closely associated with the enzyme lecithin:cholesterol acyltransferase - an enzyme involved in lipoprotein metabolism. NA
ST3GAL5 antisense RNA 1 (head to head) ENSG00000232504 ST3GAL5-AS1 ENSG00000232504 NA NA
dishevelled binding antagonist of beta catenin 3 147906 DACT3 ENSG00000197380 NA NA
peptidyl arginine deiminase 2 11240 PADI2 ENSG00000117115 This gene encodes a member of the peptidyl arginine deiminase family of enzymes, which catalyze the post-translational deimination of proteins by converting arginine residues into citrullines in the presence of calcium ions. The family members have distinct substrate specificities and tissue-specific expression patterns. The type II enzyme is the most widely expressed family member. Known substrates for this enzyme include myelin basic protein in the central nervous system and vimentin in skeletal muscle and macrophages. This enzyme is thought to play a role in the onset and progression of neurodegenerative human disorders, including Alzheimer disease and multiple sclerosis, and it has also been implicated in glaucoma pathogenesis. This gene exists in a cluster with four other paralogous genes. NA
phosphorylase, glycogen; brain 5834 PYGB ENSG00000100994 The protein encoded by this gene is a glycogen phosphorylase found predominantly in the brain. The encoded protein forms homodimers which can associate into homotetramers, the enzymatically active form of glycogen phosphorylase. The activity of this enzyme is positively regulated by AMP and negatively regulated by ATP, ADP, and glucose-6-phosphate. This enzyme catalyzes the rate-determining step in glycogen degradation. NA
protein phosphatase 1 regulatory subunit 12C 54776 PPP1R12C ENSG00000125503 The gene encodes a subunit of myosin phosphatase. The encoded protein regulates the catalytic activity of protein phosphatase 1 delta and assembly of the actin cytoskeleton. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. NA
myozenin 1 58529 MYOZ1 ENSG00000177791 The protein encoded by this gene is primarily expressed in the skeletal muscle, and belongs to the myozenin family. Members of this family function as calcineurin-interacting proteins that help tether calcineurin to the sarcomere of cardiac and skeletal muscle. They play an important role in modulation of calcineurin signaling. NA
synaptogyrin 3 9143 SYNGR3 ENSG00000127561 This gene encodes an integral membrane protein. The exact function of this protein is unclear, but studies of a similar murine protein suggest that it is a synaptic vesicle protein that also interacts with the dopamine transporter. The gene product belongs to the synaptogyrin gene family. NA
mitogen-activated protein kinase kinase 6 5608 MAP2K6 ENSG00000108984 This gene encodes a member of the dual specificity protein kinase family, which functions as a mitogen-activated protein (MAP) kinase kinase. MAP kinases, also known as extracellular signal-regulated kinases (ERKs), act as an integration point for multiple biochemical signals. This protein phosphorylates and activates p38 MAP kinase in response to inflammatory cytokines or environmental stress. As an essential component of p38 MAP kinase mediated signal transduction pathway, this gene is involved in many cellular processes such as stress induced cell cycle arrest, transcription activation and apoptosis. NA
chromosome 8 open reading frame 88 100127983 C8orf88 ENSG00000253250 NA NA
NA ENSG00000249863 RP11-177C12.1 ENSG00000249863 NA NA
prostaglandin-endoperoxide synthase 1 5742 PTGS1 ENSG00000095303 This is one of two genes encoding similar enzymes that catalyze the conversion of arachinodate to prostaglandin. The encoded protein regulates angiogenesis in endothelial cells, and is inhibited by nonsteroidal anti-inflammatory drugs such as aspirin. Based on its ability to function as both a cyclooxygenase and as a peroxidase, the encoded protein has been identified as a moonlighting protein. The protein may promote cell proliferation during tumor progression. Alternative splicing results in multiple transcript variants. NA
NA NA NA ENSG00000259716 NA TRUE
NA ENSG00000268707 RP11-247A12.7 ENSG00000268707 NA NA
sperm-tail PG-rich repeat containing 3 441476 STPG3 ENSG00000197768 NA NA
glutamic pyruvate transaminase (alanine aminotransferase) 2 84706 GPT2 ENSG00000166123 This gene encodes a mitochondrial alanine transaminase, a pyridoxal enzyme that catalyzes the reversible transamination between alanine and 2-oxoglutarate to generate pyruvate and glutamate. Alanine transaminases play roles in gluconeogenesis and amino acid metabolism in many tissues including skeletal muscle, kidney, and liver. Activating transcription factor 4 upregulates this gene under metabolic stress conditions in hepatocyte cell lines. A loss of function mutation in this gene has been associated with developmental encephalopathy. Alternative splicing results in multiple transcript variants. NA
PRKAR2A antisense RNA 1 100506637 PRKAR2A-AS1 ENSG00000224424 NA NA
cadherin 2 1000 CDH2 ENSG00000170558 This gene encodes a classical cadherin and member of the cadherin superfamily. Alternative splicing results in multiple transcript variants, at least one of which encodes a preproprotein is proteolytically processed to generate a calcium-dependent cell adhesion molecule and glycoprotein. This protein plays a role in the establishment of left-right asymmetry, development of the nervous system and the formation of cartilage and bone. NA
cell cycle exit and neuronal differentiation 1 51286 CEND1 ENSG00000184524 The protein encoded by this gene is a neuron-specific protein. The similar protein in pig enhances neuroblastoma cell differentiation in vitro and may be involved in neuronal differentiation in vivo. Multiple pseudogenes have been reported for this gene. NA
sterile alpha motif domain containing 13 148418 SAMD13 ENSG00000203943 NA NA
vitronectin 7448 VTN ENSG00000109072 The protein encoded by this gene is a member of the pexin family. It is found in serum and tissues and promotes cell adhesion and spreading, inhibits the membrane-damaging effect of the terminal cytolytic complement pathway, and binds to several serpin serine protease inhibitors. It is a secreted protein and exists in either a single chain form or a clipped, two chain form held together by a disulfide bond. NA
myosin light chain 9 10398 MYL9 ENSG00000101335 Myosin, a structural component of muscle, consists of two heavy chains and four light chains. The protein encoded by this gene is a myosin light chain that may regulate muscle contraction by modulating the ATPase activity of myosin heads. The encoded protein binds calcium and is activated by myosin light chain kinase. Two transcript variants encoding different isoforms have been found for this gene. NA
SRRM2 antisense RNA 1 100128788 SRRM2-AS1 ENSG00000205913 NA NA
NA ENSG00000272735 RP11-467P9.1 ENSG00000272735 NA NA
solute carrier family 16 member 9 220963 SLC16A9 ENSG00000165449 NA NA
phosphatidylinositol-4-phosphate 5-kinase type 1 gamma 23396 PIP5K1C ENSG00000186111 This locus encodes a type I phosphatidylinositol 4-phosphate 5-kinase. The encoded protein catalyzes phosphorylation of phosphatidylinositol 4-phosphate, producing phosphatidylinositol 4,5-bisphosphate. This enzyme is found at synapses and has been found to play roles in endocytosis and cell migration. Mutations at this locus have been associated with lethal congenital contractural syndrome. Alternatively spliced transcript variants encoding different isoforms have been described. NA
NA ENSG00000217648 RP1-95L4.4 ENSG00000217648 NA NA
ubiquitin specific peptidase 6 9098 USP6 ENSG00000129204 NA NA
ATPase phospholipid transporting 8A1 10396 ATP8A1 ENSG00000124406 The P-type adenosinetriphosphatases (P-type ATPases) are a family of proteins which use the free energy of ATP hydrolysis to drive uphill transport of ions across membranes. Several subfamilies of P-type ATPases have been identified. One subfamily catalyzes transport of heavy metal ions. Another subfamily transports non-heavy metal ions (NMHI). The protein encoded by this gene is a member of the third subfamily of P-type ATPases and acts to transport amphipaths, such as phosphatidylserine. Two transcript variants encoding different isoforms have been found for this gene. NA
vasodilator-stimulated phosphoprotein 7408 VASP ENSG00000125753 Vasodilator-stimulated phosphoprotein (VASP) is a member of the Ena-VASP protein family. Ena-VASP family members contain an EHV1 N-terminal domain that binds proteins containing E/DFPPPPXD/E motifs and targets Ena-VASP proteins to focal adhesions. In the mid-region of the protein, family members have a proline-rich domain that binds SH3 and WW domain-containing proteins. Their C-terminal EVH2 domain mediates tetramerization and binds both G and F actin. VASP is associated with filamentous actin formation and likely plays a widespread role in cell adhesion and motility. VASP may also be involved in the intracellular signaling pathways that regulate integrin-extracellular matrix interactions. VASP is regulated by the cyclic nucleotide-dependent kinases PKA and PKG. NA
paired related homeobox 1 5396 PRRX1 ENSG00000116132 The DNA-associated protein encoded by this gene is a member of the paired family of homeobox proteins localized to the nucleus. The protein functions as a transcription co-activator, enhancing the DNA-binding activity of serum response factor, a protein required for the induction of genes by growth and differentiation factors. The protein regulates muscle creatine kinase, indicating a role in the establishment of diverse mesodermal muscle types. Alternative splicing yields two isoforms that differ in abundance and expression patterns. NA
actin, alpha, cardiac muscle 1 70 ACTC1 ENSG00000159251 Actins are highly conserved proteins that are involved in various types of cell motility. Polymerization of globular actin (G-actin) leads to a structural filament (F-actin) in the form of a two-stranded helix. Each actin can bind to four others. The protein encoded by this gene belongs to the actin family which is comprised of three main groups of actin isoforms, alpha, beta, and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. Defects in this gene have been associated with idiopathic dilated cardiomyopathy (IDC) and familial hypertrophic cardiomyopathy (FHC). NA
PPARG coactivator 1 beta 133522 PPARGC1B ENSG00000155846 The protein encoded by this gene stimulates the activity of several transcription factors and nuclear receptors, including estrogen receptor alpha, nuclear respiratory factor 1, and glucocorticoid receptor. The encoded protein may be involved in fat oxidation, non-oxidative glucose metabolism, and the regulation of energy expenditure. This protein is downregulated in prediabetic and type 2 diabetes mellitus patients. Certain allelic variations in this gene increase the risk of the development of obesity. Three transcript variants encoding different isoforms have been found for this gene. NA
protein kinase cAMP-dependent type I regulatory subunit beta 5575 PRKAR1B ENSG00000188191 The protein encoded by this gene is a regulatory subunit of cyclic AMP-dependent protein kinase A (PKA), which is involved in the signaling pathway of the second messenger cAMP. Two regulatory and two catalytic subunits form the PKA holoenzyme, disbands after cAMP binding. The holoenzyme is involved in many cellular events, including ion transport, metabolism, and transcription. Several transcript variants encoding the same protein have been found for this gene. NA
serum response factor 6722 SRF ENSG00000112658 This gene encodes a ubiquitous nuclear protein that stimulates both cell proliferation and differentiation. It is a member of the MADS (MCM1, Agamous, Deficiens, and SRF) box superfamily of transcription factors. This protein binds to the serum response element (SRE) in the promoter region of target genes. This protein regulates the activity of many immediate-early genes, for example c-fos, and thereby participates in cell cycle regulation, apoptosis, cell growth, and cell differentiation. This gene is the downstream target of many pathways; for example, the mitogen-activated protein kinase pathway (MAPK) that acts through the ternary complex factors (TCFs). Two transcript variants encoding different isoforms have been found for this gene. NA
butyrylcholinesterase 590 BCHE ENSG00000114200 Mutant alleles at the BCHE locus are responsible for suxamethonium sensitivity. Homozygous persons sustain prolonged apnea after administration of the muscle relaxant suxamethonium in connection with surgical anesthesia. The activity of pseudocholinesterase in the serum is low and its substrate behavior is atypical. In the absence of the relaxant, the homozygote is at no known disadvantage. NA
NA ENSG00000186076 RP11-887P2.3 ENSG00000186076 NA NA
long intergenic non-protein coding RNA 1135 ENSG00000234807 LINC01135 ENSG00000234807 NA NA
epithelial membrane protein 3 2014 EMP3 ENSG00000142227 The protein encoded by this gene belongs to the PMP-22/EMP/MP20 family of proteins. The protein contains four transmembrane domains and two N-linked glycosylation sites. It is thought to be involved in cell proliferation, cell-cell interactions and function as a tumor suppressor. Alternative splicing results in multiple transcript variants. NA
adenylate kinase 4 205 AK4 ENSG00000162433 This gene encodes a member of the adenylate kinase family of enzymes. The encoded protein is localized to the mitochondrial matrix. Adenylate kinases regulate the adenine and guanine nucleotide compositions within a cell by catalyzing the reversible transfer of phosphate group among these nucleotides. Five isozymes of adenylate kinase have been identified in vertebrates. Expression of these isozymes is tissue-specific and developmentally regulated. A pseudogene for this gene has been located on chromosome 17. Three transcript variants encoding the same protein have been identified for this gene. Sequence alignment suggests that the gene defined by NM_013410, NM_203464, and NM_001005353 is located on chromosome 1. NA
family with sequence similarity 175 member A 84142 FAM175A ENSG00000163322 NA NA
ankyrin repeat domain 23 200539 ANKRD23 ENSG00000163126 This gene is a member of the muscle ankyrin repeat protein (MARP) family and encodes a protein with four tandem ankyrin-like repeats. The protein is localized to the nucleus, functioning as a transcriptional regulator. Expression of this protein is induced during recovery following starvation. NA
uveal autoantigen with coiled-coil domains and ankyrin repeats 55075 UACA ENSG00000137831 NA NA
phosphoglycerate mutase 1 5223 PGAM1 ENSG00000171314 The protein encoded by this gene is a mutase that catalyzes the reversible reaction of 3-phosphoglycerate (3-PGA) to 2-phosphoglycerate (2-PGA) in the glycolytic pathway. Two transcript variants encoding different isoforms have been found for this gene. NA
TIPARP antisense RNA 1 ENSG00000243926 TIPARP-AS1 ENSG00000243926 NA NA
transcription factor CP2-like 1 29842 TFCP2L1 ENSG00000115112 NA NA
cilia and flagella associated protein 53 220136 CFAP53 ENSG00000172361 This gene belongs to the CFAP53 family. It was found to be differentially expressed by the ciliated cells of frog epidermis and in skin fibroblasts from human. Mutations in this gene are associated with visceral heterotaxy-6, which implicates this gene in determination of left-right asymmetric patterning. NA
AF4/FMR2 family member 3 3899 AFF3 ENSG00000144218 This gene encodes a tissue-restricted nuclear transcriptional activator that is preferentially expressed in lymphoid tissue. Isolation of this protein initially defined a highly conserved LAF4/MLLT2 gene family of nuclear transcription factors that may function in lymphoid development and oncogenesis. In some ALL patients, this gene has been found fused to the gene for MLL. Multiple alternatively spliced transcript variants that encode different proteins have been found for this gene. NA
NA ENSG00000245864 CTC-467M3.1 ENSG00000245864 NA NA
myosin light chain 7 58498 MYL7 ENSG00000106631 NA NA
transmembrane protein 52 339456 TMEM52 ENSG00000178821 NA NA
sperm acrosome associated 6 147650 SPACA6 ENSG00000182310 NA NA
transmembrane protein 240 339453 TMEM240 ENSG00000205090 This gene encodes a transmembrane-domain containing protein found in the brain and cerebellum. Mutations in this gene result in spinocerebellar ataxia 21. NA
tetratricopeptide repeat and ankyrin repeat containing 1 9881 TRANK1 ENSG00000168016 NA NA
Purkinje cell protein 4 like 1 654790 PCP4L1 ENSG00000248485 NA NA
NA ENSG00000265168 RP11-192H23.5 ENSG00000265168 NA NA
regulator of G-protein signaling 2 5997 RGS2 ENSG00000116741 Regulator of G protein signaling (RGS) family members are regulatory molecules that act as GTPase activating proteins (GAPs) for G alpha subunits of heterotrimeric G proteins. RGS proteins are able to deactivate G protein subunits of the Gi alpha, Go alpha and Gq alpha subtypes. They drive G proteins into their inactive GDP-bound forms. Regulator of G protein signaling 2 belongs to this family. The protein acts as a mediator of myeloid differentiation and may play a role in leukemogenesis. NA
NA ENSG00000260572 RP11-16N11.2 ENSG00000260572 NA NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",19,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);

Factor 20 Annotations

out <- mygene::queryMany(gene_list[20,],  scopes="ensembl.gene", fields=c("name", "summary", "symbol"), species="human");
## Finished
## Pass returnall=TRUE to return lists of duplicate or missing query terms.
kable(as.data.frame(out))
symbol X_id query name summary notfound
GTF2IP13 ENSG00000272556 ENSG00000272556 general transcription factor IIi pseudogene 13 NA NA
FOSL1 8061 ENSG00000175592 FOS like 1, AP-1 transcription factor subunit The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. Several transcript variants encoding different isoforms have been found for this gene. NA
CCER2 643669 ENSG00000262484 coiled-coil glutamate rich protein 2 NA NA
CEBPA 1050 ENSG00000245848 CCAAT/enhancer binding protein alpha This intronless gene encodes a transcription factor that contains a basic leucine zipper (bZIP) domain and recognizes the CCAAT motif in the promoters of target genes. The encoded protein functions in homodimers and also heterodimers with CCAAT/enhancer-binding proteins beta and gamma. Activity of this protein can modulate the expression of genes involved in cell cycle regulation as well as in body weight homeostasis. Mutation of this gene is associated with acute myeloid leukemia. The use of alternative in-frame non-AUG (GUG) and AUG start codons results in protein isoforms with different lengths. Differential translation initiation is mediated by an out-of-frame, upstream open reading frame which is located between the GUG and the first AUG start codons. NA
CTD-3025N20.3 ENSG00000272010 ENSG00000272010 NA NA NA
AC017101.10 ENSG00000227227 ENSG00000227227 NA NA NA
IPO7P2 ENSG00000225674 ENSG00000225674 importin 7 pseudogene 2 NA NA
GLI1 2735 ENSG00000111087 GLI family zinc finger 1 This gene encodes a member of the Kruppel family of zinc finger proteins. The encoded transcription factor is activated by the sonic hedgehog signal transduction cascade and regulates stem cell proliferation. The activity and nuclear localization of this protein is negatively regulated by p53 in an inhibitory loop. Multiple transcript variants encoding different isoforms have been found for this gene. NA
NAMPTP1 ENSG00000229644 ENSG00000229644 nicotinamide phosphoribosyltransferase pseudogene 1 NA NA
GPRC5A 9052 ENSG00000013588 G protein-coupled receptor class C group 5 member A This gene encodes a member of the type 3 G protein-coupling receptor family, characterized by the signature 7-transmembrane domain motif. The encoded protein may be involved in interaction between retinoid acid and G protein signalling pathways. Retinoic acid plays a critical role in development, cellular growth, and differentiation. This gene may play a role in embryonic development and epithelial cell differentiation. NA
NR4A3 8013 ENSG00000119508 nuclear receptor subfamily 4 group A member 3 This gene encodes a member of the steroid-thyroid hormone-retinoid receptor superfamily. The encoded protein may act as a transcriptional activator. The protein can efficiently bind the NGFI-B Response Element (NBRE). Three different versions of extraskeletal myxoid chondrosarcomas (EMCs) are the result of reciprocal translocations between this gene and other genes. The translocation breakpoints are associated with Nuclear Receptor Subfamily 4, Group A, Member 3 (on chromosome 9) and either Ewing Sarcome Breakpoint Region 1 (on chromosome 22), RNA Polymerase II, TATA Box-Binding Protein-Associated Factor, 68-KD (on chromosome 17), or Transcription factor 12 (on chromosome 15). Multiple transcript variants encoding different isoforms have been found for this gene. NA
CCDC150 284992 ENSG00000144395 coiled-coil domain containing 150 NA NA
CIDEC 63924 ENSG00000187288 cell death inducing DFFA like effector c This gene encodes a member of the cell death-inducing DNA fragmentation factor-like effector family. Members of this family play important roles in apoptosis. The encoded protein promotes lipid droplet formation in adipocytes and may mediate adipocyte apoptosis. This gene is regulated by insulin and its expression is positively correlated with insulin sensitivity. Mutations in this gene may contribute to insulin resistant diabetes. A pseudogene of this gene is located on the short arm of chromosome 3. Alternatively spliced transcript variants that encode different isoforms have been observed for this gene. NA
FXYD1 5348 ENSG00000266964 FXYD domain containing ion transport regulator 1 This gene encodes a member of a family of small membrane proteins that share a 35-amino acid signature sequence domain, beginning with the sequence PFXYD and containing 7 invariant and 6 highly conserved amino acids. The approved human gene nomenclature for the family is FXYD-domain containing ion transport regulator. Mouse FXYD5 has been termed RIC (Related to Ion Channel). FXYD2, also known as the gamma subunit of the Na,K-ATPase, regulates the properties of that enzyme. FXYD1 (phospholemman), FXYD2 (gamma), FXYD3 (MAT-8), FXYD4 (CHIF), and FXYD5 (RIC) have been shown to induce channel activity in experimental expression systems. Transmembrane topology has been established for two family members (FXYD1 and FXYD2), with the N-terminus extracellular and the C-terminus on the cytoplasmic side of the membrane. The protein encoded by this gene is a plasma membrane substrate for several kinases, including protein kinase A, protein kinase C, NIMA kinase, and myotonic dystrophy kinase. It is thought to form an ion channel or regulate ion channel activity. Transcript variants with different 5’ UTR sequences have been described in the literature. NA
CTD-2527I21.4 ENSG00000221857 ENSG00000221857 NA NA NA
CYP1A1 1543 ENSG00000140465 cytochrome P450 family 1 subfamily A member 1 This gene, CYP1A1, encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and its expression is induced by some polycyclic aromatic hydrocarbons (PAHs), some of which are found in cigarette smoke. The enzyme’s endogenous substrate is unknown; however, it is able to metabolize some PAHs to carcinogenic intermediates. The gene has been associated with lung cancer risk. A related family member, CYP1A2, is located approximately 25 kb away from CYP1A1 on chromosome 15. Alternative splicing results in multiple transcript variants encoding distinct isoforms. NA
SRXN1 140809 ENSG00000271303 sulfiredoxin 1 NA NA
ACTG1P17 283693 ENSG00000259315 actin gamma 1 pseudogene 17 NA NA
KRTAP5-9 3846 ENSG00000254997 keratin associated protein 5-9 NA NA
CTD-2517M22.14 ENSG00000255182 ENSG00000255182 NA NA NA
ZNF770 54989 ENSG00000198146 zinc finger protein 770 NA NA
RP11-618G20.1 ENSG00000258964 ENSG00000258964 NA NA NA
VLDLR-AS1 401491 ENSG00000236404 VLDLR antisense RNA 1 NA NA
GPT 2875 ENSG00000167701 glutamic-pyruvate transaminase (alanine aminotransferase) This gene encodes cytosolic alanine aminotransaminase 1 (ALT1); also known as glutamate-pyruvate transaminase 1. This enzyme catalyzes the reversible transamination between alanine and 2-oxoglutarate to generate pyruvate and glutamate and, therefore, plays a key role in the intermediary metabolism of glucose and amino acids. Serum activity levels of this enzyme are routinely used as a biomarker of liver injury caused by drug toxicity, infection, alcohol, and steatosis. A related gene on chromosome 16 encodes a putative mitochondrial alanine aminotransaminase. NA
UBE2V1P2 ENSG00000214192 ENSG00000214192 ubiquitin conjugating enzyme E2 variant 1 pseudogene 2 NA NA
RP11-134K13.4 ENSG00000271967 ENSG00000271967 NA NA NA
NPM1P39 ENSG00000225159 ENSG00000225159 nucleophosmin 1 (nucleolar phosphoprotein B23, numatrin) pseudogene 39 NA NA
RP11-1096G20.5 ENSG00000266368 ENSG00000266368 NA NA NA
RARRES1 5918 ENSG00000118849 retinoic acid receptor responder 1 This gene was identified as a retinoid acid (RA) receptor-responsive gene. It encodes a type 1 membrane protein. The expression of this gene is upregulated by tazarotene as well as by retinoic acid receptors. The expression of this gene is found to be downregulated in prostate cancer, which is caused by the methylation of its promoter and CpG island. Alternatively spliced transcript variant encoding distinct isoforms have been observed. NA
RP11-130L8.2 ENSG00000269976 ENSG00000269976 NA NA NA
IFFO2 126917 ENSG00000169991 intermediate filament family orphan 2 NA NA
MTPN 136319 ENSG00000105887 myotrophin The transcript produced from this gene is bi-cistronic and can encode both myotrophin and leucine zipper protein 6. The myotrophin protein is associated with cardiac hypertrophy, where it is involved in the conversion of NFkappa B p50-p65 heterodimers to p50-p50 and p65-p65 homodimers. This protein also has a potential function in cerebellar morphogenesis, and it may be involved in the differentiation of cerebellar neurons, particularly of granule cells. A cryptic ORF at the 3’ end of this transcript uses a novel internal ribosome entry site and a non-AUG translation initiation codon to produce leucine zipper protein 6, a 6.4 kDa tumor antigen that is associated with myeloproliferative disease. NA
RP11-299M14.2 ENSG00000255343 ENSG00000255343 NA NA NA
NA NA ENSG00000273075 NA NA TRUE
SLC35E1 79939 ENSG00000127526 solute carrier family 35 member E1 NA NA
FAM229A 100128071 ENSG00000225828 family with sequence similarity 229 member A NA NA
SMARCA5-AS1 ENSG00000245112 ENSG00000245112 SMARCA5 antisense RNA 1 NA NA
HIF1A 3091 ENSG00000100644 hypoxia inducible factor 1 alpha subunit This gene encodes the alpha subunit of transcription factor hypoxia-inducible factor-1 (HIF-1), which is a heterodimer composed of an alpha and a beta subunit. HIF-1 functions as a master regulator of cellular and systemic homeostatic response to hypoxia by activating transcription of many genes, including those involved in energy metabolism, angiogenesis, apoptosis, and other genes whose protein products increase oxygen delivery or facilitate metabolic adaptation to hypoxia. HIF-1 thus plays an essential role in embryonic vascularization, tumor angiogenesis and pathophysiology of ischemic disease. Alternatively spliced transcript variants encoding different isoforms have been identified for this gene. NA
CAHM 100526820 ENSG00000270419 colon adenocarcinoma hypermethylated (non-protein coding) NA NA
GSDMB 55876 ENSG00000073605 gasdermin B This gene encodes a member of the gasdermin-domain containing protein family. Other gasdermin-family genes are implicated in the regulation of apoptosis in epithelial cells, and are linked to cancer. Multiple transcript variants encoding different isoforms have been found for this gene. Additional variants have been described, but they are candidates for nonsense-mediated mRNA decay (NMD) and are unlikely to be protein-coding. NA
TBX15 6913 ENSG00000092607 T-box 15 This gene belongs to the T-box family of genes, which encode a phylogenetically conserved family of transcription factors that regulate a variety of developmental processes. All these genes contain a common T-box DNA-binding domain. Mutations in this gene are associated with Cousin syndrome. NA
ARG2 384 ENSG00000081181 arginase 2 Arginase catalyzes the hydrolysis of arginine to ornithine and urea. At least two isoforms of mammalian arginase exists (types I and II) which differ in their tissue distribution, subcellular localization, immunologic crossreactivity and physiologic function. The type II isoform encoded by this gene, is located in the mitochondria and expressed in extra-hepatic tissues, especially kidney. The physiologic role of this isoform is poorly understood; it is thought to play a role in nitric oxide and polyamine metabolism. Transcript variants of the type II gene resulting from the use of alternative polyadenylation sites have been described. NA
NPM1P6 ENSG00000213881 ENSG00000213881 nucleophosmin 1 (nucleolar phosphoprotein B23, numatrin) pseudogene 6 NA NA
MIR3661 100500905 ENSG00000266751 microRNA 3661 microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop. NA
RP1-117B12.4 ENSG00000253102 ENSG00000253102 NA NA NA
AZGP1 563 ENSG00000160862 alpha-2-glycoprotein 1, zinc-binding NA NA
RP11-46F15.2 ENSG00000238260 ENSG00000238260 NA NA NA
RP4-791M13.3 ENSG00000254539 ENSG00000254539 NA NA NA
NAMPT 10135 ENSG00000105835 nicotinamide phosphoribosyltransferase This gene encodes a protein that catalyzes the condensation of nicotinamide with 5-phosphoribosyl-1-pyrophosphate to yield nicotinamide mononucleotide, one step in the biosynthesis of nicotinamide adenine dinucleotide. The protein belongs to the nicotinic acid phosphoribosyltransferase (NAPRTase) family and is thought to be involved in many important biological processes, including metabolism, stress response and aging. This gene has a pseudogene on chromosome 10. NA
DDX21 9188 ENSG00000165732 DEAD-box helicase 21 DEAD box proteins, characterized by the conserved motif Asp-Glu-Ala-Asp (DEAD), are putative RNA helicases. They are implicated in a number of cellular processes involving alteration of RNA secondary structure such as translation initiation, nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on their distribution patterns, some members of this family are believed to be involved in embryogenesis, spermatogenesis, and cellular growth and division. This gene encodes a DEAD box protein, which is an antigen recognized by autoimmune antibodies from a patient with watermelon stomach disease. This protein unwinds double-stranded RNA, folds single-stranded RNA, and may play important roles in ribosomal RNA biogenesis, RNA editing, RNA transport, and general transcription. NA
RP11-457M11.5 ENSG00000261584 ENSG00000261584 NA NA NA
AC025442.3 ENSG00000253744 ENSG00000253744 NA NA NA
KIAA1683 80726 ENSG00000130518 KIAA1683 NA NA
RP11-458D21.1 ENSG00000233396 ENSG00000233396 NA NA NA
LRRC59 55379 ENSG00000108829 leucine rich repeat containing 59 NA NA
TWF1P1 ENSG00000178082 ENSG00000178082 twinfilin 1 pseudogene 1 NA NA
ZNF426 79088 ENSG00000130818 zinc finger protein 426 Kaposi’s sarcoma-associated herpesvirus (KSHV) can be reactivated from latency by the viral protein RTA. The protein encoded by this gene is a zinc finger transcriptional repressor that interacts with RTA to modulate RTA-mediated reactivation of KSHV. While the encoded protein can repress KSHV reactivation, RTA can induce degradation of this protein through the ubiquitin-proteasome pathway to overcome the repression. Several transcript variants encoding different isoforms have been found for this gene. NA
NRBP2 340371 ENSG00000185189 nuclear receptor binding protein 2 NA NA
RP11-127B20.3 ENSG00000272677 ENSG00000272677 NA NA NA
RP11-299G20.2 ENSG00000259172 ENSG00000259172 NA NA NA
YWHAZP3 ENSG00000229932 ENSG00000229932 tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein, zeta pseudogene 3 NA NA
CTD-2373J6.1 ENSG00000260871 ENSG00000260871 NA NA NA
NA NA ENSG00000269942 NA NA TRUE
RP11-16E12.2 ENSG00000259772 ENSG00000259772 NA NA NA
SNHG11 128439 ENSG00000174365 small nucleolar RNA host gene 11 This gene is a member of the non-protein-coding multiple snoRNA host gene family. Two snoRNAs are derived from the introns of this host gene. Although many alternative splice variants have been observed, the gene is thought to have no protein-coding potential. NA
HESX1 8820 ENSG00000163666 HESX homeobox 1 This gene encodes a conserved homeobox protein that is a transcriptional repressor in the developing forebrain and pituitary gland. Mutations in this gene are associated with septooptic dysplasia, HESX1-related growth hormone deficiency, and combined pituitary hormone deficiency. NA
RP11-561C5.4 ENSG00000229212 ENSG00000229212 NA NA NA
CTC-336P14.1 ENSG00000271228 ENSG00000271228 NA NA NA
AC016722.4 ENSG00000228925 ENSG00000228925 NA NA NA
RP11-1277A3.3 ENSG00000272459 ENSG00000272459 NA NA NA
HSPH1 10808 ENSG00000120694 heat shock protein family H (Hsp110) member 1 NA NA
MAPK13 5603 ENSG00000156711 mitogen-activated protein kinase 13 This gene encodes a member of the mitogen-activated protein (MAP) kinase family. MAP kinases act as an integration point for multiple biochemical signals, and are involved in a wide variety of cellular processes such as proliferation, differentiation, transcription regulation and development. The encoded protein is a p38 MAP kinase and is activated by proinflammatory cytokines and cellular stress. Substrates of the encoded protein include the transcription factor ATF2 and the microtubule dynamics regulator stathmin. Alternatively spliced transcript variants have been observed for this gene. NA
AKR7L ENSG00000211454 ENSG00000211454 aldo-keto reductase family 7-like (gene/pseudogene) NA NA
RP5-867C24.5 ENSG00000261872 ENSG00000261872 NA NA NA
NA NA ENSG00000267167 NA NA TRUE
TDGP1 ENSG00000255725 ENSG00000255725 thymine-DNA glycosylase pseudogene 1 NA NA
MIR3652 100500842 ENSG00000265072 microRNA 3652 microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop. NA
FOSL2 2355 ENSG00000075426 FOS like 2, AP-1 transcription factor subunit The Fos gene family consists of 4 members: FOS, FOSB, FOSL1, and FOSL2. These genes encode leucine zipper proteins that can dimerize with proteins of the JUN family, thereby forming the transcription factor complex AP-1. As such, the FOS proteins have been implicated as regulators of cell proliferation, differentiation, and transformation. NA
GNRH1 2796 ENSG00000147437 gonadotropin releasing hormone 1 This gene encodes a preproprotein that is proteolytically processed to generate a peptide that is a member of the gonadotropin-releasing hormone (GnRH) family of peptides. Alternative splicing results in multiple transcript variants, at least one of which is secreted and then cleaved to generate gonadoliberin-1 and GnRH-associated peptide 1. Gonadoliberin-1 stimulates the release of luteinizing and follicle stimulating hormones, which are important for reproduction. Mutations in this gene are associated with hypogonadotropic hypogonadism. NA
PRSS1 5644 ENSG00000204983 protease, serine 1 This gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. It is active on peptide linkages involving the carboxyl group of lysine or arginine. Mutations in this gene are associated with hereditary pancreatitis. This gene and several other trypsinogen genes are localized to the T cell receptor beta locus on chromosome 7. NA
RDH5 5959 ENSG00000135437 retinol dehydrogenase 5 This gene encodes an enzyme belonging to the short-chain dehydrogenases/reductases (SDR) family. This retinol dehydrogenase functions to catalyze the final step in the biosynthesis of 11-cis retinaldehyde, which is the universal chromophore of visual pigments. Mutations in this gene cause autosomal recessive fundus albipunctatus, a rare form of night blindness that is characterized by a delay in the regeneration of cone and rod photopigments. Alternative splicing results in multiple transcript variants. Read-through transcription also exists between this gene and the neighboring upstream BLOC1S1 (biogenesis of lysosomal organelles complex-1, subunit 1) gene. NA
CTD-2035E11.5 ENSG00000272144 ENSG00000272144 NA NA NA
NA NA ENSG00000272365 NA NA TRUE
TMEM133 83935 ENSG00000170647 transmembrane protein 133 There is evidence that this intronless gene is transcribed but the protein is predicted. The gene function is unknown. NA
AC005540.3 ENSG00000235852 ENSG00000235852 NA NA NA
NA NA ENSG00000261252 NA NA TRUE
PDE6C 5146 ENSG00000095464 phosphodiesterase 6C This gene encodes the alpha-prime subunit of cone phosphodiesterase, which is composed of a homodimer of two alpha-prime subunits and 3 smaller proteins of 11, 13, and 15 kDa. Mutations in this gene are associated with cone dystrophy type 4 (COD4). NA
LINC01089 338799 ENSG00000212694 long intergenic non-protein coding RNA 1089 NA NA
DPF3 8110 ENSG00000205683 double PHD fingers 3 This gene encodes a member of the D4 protein family. The encoded protein is a transcription regulator that binds acetylated histones and is a component of the BAF chromatin remodeling complex. Alternate splicing results in multiple transcript variants encoding different isoforms. NA
NR1H3 10062 ENSG00000025434 nuclear receptor subfamily 1 group H member 3 The protein encoded by this gene belongs to the NR1 subfamily of the nuclear receptor superfamily. The NR1 family members are key regulators of macrophage function, controlling transcriptional programs involved in lipid homeostasis and inflammation. This protein is highly expressed in visceral organs, including liver, kidney and intestine. It forms a heterodimer with retinoid X receptor (RXR), and regulates expression of target genes containing retinoid response elements. Studies in mice lacking this gene suggest that it may play an important role in the regulation of cholesterol homeostasis. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. NA
RAB37 326624 ENSG00000172794 RAB37, member RAS oncogene family Rab proteins are low molecular mass GTPases that are critical regulators of vesicle trafficking. For additional background information on Rab proteins, see MIM 179508. NA
GPR84 53831 ENSG00000139572 G protein-coupled receptor 84 NA NA
RP11-333E13.2 ENSG00000250568 ENSG00000250568 NA NA NA
RP11-862L9.3 ENSG00000266844 ENSG00000266844 NA NA NA
ZSWIM4 65249 ENSG00000132003 zinc finger SWIM-type containing 4 NA NA
C2orf82 389084 ENSG00000182600 chromosome 2 open reading frame 82 NA NA
HSP90AA2P ENSG00000224411 ENSG00000224411 heat shock protein 90kDa alpha family class A member 2, pseudogene NA NA
AC079305.10 ENSG00000222043 ENSG00000222043 NA NA NA
LOC171391 171391 ENSG00000255284 uncharacterized LOC171391 NA NA
RP11-408O19.5 ENSG00000271631 ENSG00000271631 NA NA NA
write.table(as.factor(out$query), paste0("../utilities/GTEX2013_sparse_fac_voom/gene_names_clus_",20,".txt"), col.names = FALSE,
            row.names=FALSE, quote=FALSE);